sample_n()filter()Author: Finn Deike
Welcome to my interactive RTutor Problemset, containing the main results from the paper Credit Elasticities in Less-Developed Economies: Implications for Microfinance by Dean S. Karlan (Department of Economics, Yale University) and Jonathan Zinma (Department of Economics, Dartmouth College) published by the American Economic Review in 2008.
The paper, the data and a supplemental appendix are available online at the following websites: - Paper:https://www.aeaweb.org/articles?id=10.1257/aer.98.3.1040 - Data: https://assets.aeaweb.org/asset-server/articles-attachments/aer/data/june08/20070848_app.pdf - Appendix: https://assets.aeaweb.org/asset-server/articles-attachments/aer/data/june08/20070848_app.pdf
Data Overview
Experimental Design
Theoretical Model
Price Elasticity Results
4.1 Price Sensitivities of Loan Take-Up
4.2 Price Sensitivity of Loan Size
5.1 Profit Maximization
5.2 Targeted Access
Maturity Elasticities
Conclusion
References
This interactive problem set consists of a reproduction of the results of the above-named paper. The results are reproduced in code chunks, theoretical explanations, info boxes and quizzes. The procedure is very simple. You will solve tasks by entering or editing R code chunks and/or answer short quizzes. Before solving any task you have to press the Edit-button to be able to edit the code chunk. After editing the chunk or answering a quiz question, you have to press Check-button to get a feedback whether you answer is correct or incorrect. Sometimes exercises are more difficult and you might need an advice to solve the task, in this case you can press the Hint button to get further advice. Besides code chunks and quizzes the problem set composes of info boxes. The info boxes provide you with additional information of variables and explanations of statistical models or R commands.
Good work and patience will be awarded with interesting awards which include some additional information on exercise related topics!
“If you go out into the real world, you cannot miss seeing that the poor are poor not because they are untrained or illiterate but because they cannot retain the returns of their labor. They have no control over capital, and it is the ability to control capital that gives people the power to rise out of poverty.”
― Muhammad Yunus, Banker to the Poor: Micro-Lending and the Battle Against World Poverty (“https://microfinancingafrica.org/10-profound-quotes-about-microfinance/”)
Muhammad Yunus is a Bangladeshi social entrepreneur, banker, economist,civil society leader and Nobel Prize winner. He is one of the pioneers of microcredit and microfinance and the founder of the “Grameen Bank” which is also known as “The Bank of the Poor”. In his book “Banker To The Poor” (1997), Yunus describes how he devoted to the “Grameen Bank” to provide the poorest people of Bangladesh with minuscule loans. The quintessence of the book implies that a small amount of credit can transform the lives of the poorest people in the world tremendously.
The book inspired me to find out more about lending micro credits to the poorest in the world and therefore I decided to write my Bachelor thesis about this topic.
Over three billion people in developing countries are still without effective access to loan and deposit services. The problem is particularly acute in Sub-Saharan Africa, where only between five and twenty-five percent of households have a formal relationship with a financial institution. The region is also home to just two percent of the world’s microfinance institutions. Lack of access to financial services is therefore one of the largest constraints to private sector development in Africa. Addressing this shortfall requires creating new institutions and building operational and managerial capacity from the ground up. (IFC) Providing access to microcredit is expensive for lenders in view of the high transaction costs relative to the small amounts borrowed. Therefore, to encourage micro-lending, profit-driven microfinance organization, including retail and furniture stores, are permitted to charge interest rates that are higher than those payable in respect of debt procured from the formal financial sector. This, in turn makes microcredit expensive for borrowers and even more attractive for lenders.
Microfinance institutions (MFI) are often forced to increase interest rates to eliminate reliance of subsidies by policymakers. However, this makes only sense if the poor are really insensitive to interest rates. Otherwise increasing interest rates would limit the access. Many economic model suggest that loan pricing is tightly related to reliance on subsidies and therefore also to the functioning of the MFI market. However there is little evidence indicating interest rate sensitivities in MFI markets. Therefore we test the hypotheses of price inelastic demand using randomized trials conducted by a high-risk consumer lender in South Africa. A field experiment with randomized individual interest rate direct mail offers to more than 50,000 former clients of the lender based on the client’s prior rate. Though loan price is not the only determining factor that might influence the demand and thus the MFI profit. Borrowers with low liquidity might also respond to maturity because longer maturities reduce monthly payment and thereby increasing the cash flow. Maturity might have the same or even larger impact on the demand for credit then the loan price.
We will observe the repayment behavior of the clients by identifying demand curves for consumer credit by randomizing both the offered interest rate and the maturity of an example loan. Furthermore, we examine maturity elasticities of demand using exogenous variation in maturities engineered by a randomly assigned, nonbinding example maturity (four, six or twelve month) presented in some direct mailers. The randomly assigned example maturity predicts the actual maturity chosen.
Good luck and have fun solving this interactive problem set to increase your knowledge about micro credit lending in South Africa!
In this first section we will get to know the data we use in this interactive R-Problemset.
The data set we are working with kz_demandelasts_aer08.dta is a Stata data file. It contains data of more then 50,000 former clients of a lender in South Africa. The information of the clients range from experimental variables to demographic characteristics.
Before we can take a look at our data set kz_demandelasts_aer08.dta, we have to load the data. To do so we use the R-command read_dta() from the haven package.
At first we have to load in the package with the library(".package") command.The read_dta(.data) from the package haven reads a file in Stata version 5-12 binary format into a data frame in R. This function only supports Stata formats after 12, but since we have a 5-12 format it can be used.
For further information check: https://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html
1.1) First load the package haven with the library() command. If you think your answer is right press check:
# Load the package with the library() command
library(haven)
1.2) Now you can use the command read_dta() to load the data file kz_demandelasts_aer08.dta and store it in the variable stata_data.
# #
# ___ <- ___("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
stata_data <- read_dta("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
After loading in the data, we can now have a deeper look into our data and the according parameters. Let’s have a glance at our data stored in stata_data.
There are a variety of ways to get an overview on a data set. We will use the R function head()to show the first six rows of our data set stored in stata_data. An alternative would be to show six random sample rows, by using the command sample_n(data,rows). This will help us to get familiar with the variables and understand how the data set is structured.
1.3) Show the first six rows of our data file stored in stata_data. Afterwards press the check button it will execute the function and show you if you are right or wrong:
# Use the function head() to show the first six rows of the data frame
head(stata_data)
## sales_grossincome sales_netincome wave grossincome dependants dormancy
## 1 NA NA 1 NA 5 0
## 2 NA NA 1 NA 0 23
## 3 NA NA 1 NA 1 13
## 4 NA NA 1 NA 0 17
## 5 2.2536 3595.85 1 NA 1 3
## 6 NA NA 1 NA 0 1
## itcscore appscore lastterm lastamount risk itczero offer4 yearlong final4
## 1 601 36 4 600 MEDIUM 0 6.75 0 5.25
## 2 593 21 4 400 HIGH 0 13.25 0 13.25
## 3 672 32 4 1000 HIGH 0 12.50 0 12.25
## 4 697 30 4 1200 HIGH 0 4.25 0 4.25
## 5 605 29 4 300 MEDIUM 0 6.50 0 6.50
## 6 0 23 4 600 LOW 1 5.50 0 5.50
## trcount tookup tookup_afterdead_enforced applied loansize branchuse
## 1 3 0 0 0 0 CAB
## 2 1 0 0 0 0 CPL
## 3 2 0 0 0 0 CPL
## 4 1 0 0 0 0 CPL
## 5 2 0 1 0 0 CAB
## 6 17 0 0 0 0 CPL
## normrate_less pstdue_average married female edhi age province rural
## 1 1 NA 0 1 0 32.25188 Kwazulu-Natal 0
## 2 0 NA 0 0 0 30.40657 Kwazulu-Natal 0
## 3 0 NA 1 1 0 31.29363 Kwazulu-Natal 0
## 4 1 NA 0 0 0 24.85969 Kwazulu-Natal 0
## 5 1 NA 0 1 1 31.25530 Kwazulu-Natal 0
## 6 1 NA 0 1 0 62.51335 Kwazulu-Natal 0
## waved1 waved2 waved3 rejected low med onetermshown termshown termshown4
## 1 1 0 0 0 0 1 0 NA NA
## 2 1 0 0 0 0 0 NA NA NA
## 3 1 0 0 0 0 0 NA NA NA
## 4 1 0 0 0 0 0 NA NA NA
## 5 1 0 0 0 0 1 0 NA NA
## 6 1 0 0 0 1 0 0 NA NA
## termshown6 termshown12 term high itcany appscore0 tookup_outside_only
## 1 NA NA 0 0 1 0 0
## 2 NA NA 0 1 1 0 0
## 3 NA NA 0 1 1 0 1
## 4 NA NA 0 1 1 0 0
## 5 NA NA 0 0 1 0 0
## 6 NA NA 0 0 0 0 0
## normrate_more grossinterest lntrcount lnage lnitcscore lnappscore
## 1 0 0 1.09861 3.47358 6.39859 3.58352
## 2 1 0 0.00000 3.41466 6.38519 3.04452
## 3 1 0 0.69315 3.44341 6.51026 3.46574
## 4 0 0 0.00000 3.21325 6.54679 3.40120
## 5 0 0 0.69315 3.44219 6.40523 3.36730
## 6 0 0 2.83321 4.13538 NA 3.13549
## lnlastamount lnoffer4 lnloansize
## 1 6.39693 1.90954 NA
## 2 5.99146 2.58400 NA
## 3 6.90776 2.52573 NA
## 4 7.09008 1.44692 NA
## 5 5.70378 1.87180 NA
## 6 6.39693 1.70475 NA
sample_n()sample_n() is a function used to select random samples in R using Dplyr Package. The sample_n() function selects random n rows from a data frame. The first parameter contains the data frame name, the second parameter of the function tells R the number of rows to select.
For further information check: https://dplyr.tidyverse.org/reference/sample.html
1.4) Now try sample_n() in order to show six random rows of our data set stata_data.
sample_n(stata_data,6)
## sales_grossincome sales_netincome wave grossincome dependants dormancy
## 1 2 1896 2 3.54045 1 15
## 2 NA NA 3 4.99950 4 11
## 3 NA NA 3 5.88000 2 9
## 4 NA NA 2 2.62973 2 8
## 5 NA NA 3 8.50000 1 2
## 6 NA NA 3 2.87700 4 4
## itcscore appscore lastterm lastamount risk itczero offer4 yearlong final4
## 1 691 37 4 800 HIGH 0 7.99 0 7.49
## 2 517 38 4 1200 HIGH 0 9.99 0 7.49
## 3 506 37 4 600 HIGH 0 4.49 0 4.49
## 4 644 20 4 1000 HIGH 0 7.99 1 7.99
## 5 623 40 4 1800 HIGH 0 10.25 0 10.25
## 6 592 40 4 800 HIGH 0 8.88 1 8.88
## trcount tookup tookup_afterdead_enforced applied loansize branchuse
## 1 1 1 0 1 1000 CDM
## 2 1 0 0 0 0 CUT
## 3 9 0 0 0 0 CLT
## 4 3 0 0 0 0 CIM
## 5 1 0 0 0 0 CGM
## 6 1 0 0 0 0 CKP
## normrate_less pstdue_average married female edhi age province
## 1 1 0 0 1 1 28.19165 Kwazulu-Natal
## 2 1 NA 1 0 0 48.23272 Eastern Cape
## 3 1 NA 1 0 1 37.74675 Limpopo Province
## 4 1 NA 1 0 0 31.93977 Kwazulu-Natal
## 5 1 NA 1 0 1 53.32238 Gauteng
## 6 1 NA 1 0 0 56.39699 Gauteng
## rural waved1 waved2 waved3 rejected low med onetermshown termshown termshown4
## 1 0 0 1 0 0 0 0 NA NA NA
## 2 0 0 0 1 0 0 0 NA NA NA
## 3 1 0 0 1 0 0 0 NA NA NA
## 4 0 0 1 0 0 0 0 NA NA NA
## 5 0 0 0 1 0 0 0 NA NA NA
## 6 0 0 0 1 0 0 0 NA NA NA
## termshown6 termshown12 term high itcany appscore0 tookup_outside_only
## 1 NA NA 4 1 1 0 0
## 2 NA NA 0 1 1 0 0
## 3 NA NA 0 1 1 0 1
## 4 NA NA 0 1 1 0 0
## 5 NA NA 0 1 1 0 1
## 6 NA NA 0 1 1 0 0
## normrate_more grossinterest lntrcount lnage lnitcscore lnappscore
## 1 0 319.6 0.00000 3.33903 6.53814 3.61092
## 2 0 0.0 0.00000 3.87604 6.24804 3.63759
## 3 0 0.0 2.19722 3.63090 6.22654 3.61092
## 4 0 0.0 1.09861 3.46385 6.46770 2.99573
## 5 0 0.0 0.00000 3.97636 6.43455 3.68888
## 6 0 0.0 0.00000 4.03242 6.38351 3.68888
## lnlastamount lnoffer4 lnloansize
## 1 6.68461 2.07819 6.90776
## 2 7.09008 2.30158 NA
## 3 6.39693 1.50185 NA
## 4 6.90776 2.07819 NA
## 5 7.49554 2.32728 NA
## 6 6.68461 2.18380 NA
Both functions show you six rows of the sample data frame where each row typifies one of the 58,168 clients from 86 mostly urban branches who had borrowed from the lender in the past 24 months, and did not currently have a loan from the lender as of 30 days prior to the mailer. Furthermore we can see 54 columns. Every column stands for one variable. We are going to take especially a look at series of experimental variables. These include, inter alia, the three rates which were assigned to each client: offer4 a randomized individual interest rate directed mail offer, final4 a contract rate that was slightly less than the offer rate and yearlong the dynamic repayment incentive that extended preferential contract rates for up to one year. In addition the three example maturities presented in some mailers termshown4, termshown6 and termshown12 (four, six and twelve month), which give a prediction of the actual maturity chosen. The variables tookup, applied and onetermshown presents those clients who borrowed, those who applied and those who were eligible for the maturity suggestion randomization. As well as the loan size which is described by the variable loansize. More over we will examine demographic characteristics like: female, married, age, edhi (more educated), rural, dependants (number of dependants), grossincome (gross monthly income 000s of rand), trcount (number of loans with the lender), dormancy (number of months since the last loan with lender) and the three risk categories low (low risk), mid (medium risk) and high risk (neither med or low). Since the remaining variables are mostly not relevant for our analysis I will not give an explanation of those. If you are interested in the remaining variables, you can press the data button to get to the Data Explorer. There you get descriptions of all variables.
You successfully loaded our data file and got to know our important variables. Now we have a basis to build on our analysis! The founder of R Studio JJ Allaire would be proud of you!
But how many clients applied for the loan on time and actually were assigned to borrow from the lender?
To answer this question we could extract all the clients from our data set who actually accepted the offer, were approved and ended up taking the loan. To subset a data frame we can use the function filter() from the dplyr package.
filter()The filter(.data, condition,...) function from the dplyr package is used to subset a data frame, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. The first parameter contains the condition e.g. filter(female == 1). The second part of the function is relevant when the data input is grouped. For now this is not relevant therefore we will ignore it temporarily.
For further information check: https://dplyr.tidyverse.org/reference/filter.html
1.5) Filter the main data frame stata_data by using filter() with the condition tookup == 1 to get the sample data frame borrowed. Just fill in the remaining commands. Afterwards press Check.
# ___ <- stata_data %>% filter(___)
borrowed <- stata_data %>% filter(tookup == 1)
Now we created a data frame which only contains clients who borrowed from the lender. But we still don’t know how many borrowed. Therefore you can press the Data button to get to the Data Explorer and it will show you the number of rows of the sample frame borrowed. This number corresponds with the number of borrowers.
Quiz: How many clients borrowed from our lender?
53810 [ ]
4540 [ ]
3887 [x]
3096 [ ]
Another aspect that could by interesting is to analyze the geographic circumstances. Our data frame stata_data contains a column named province, which contains the province in which the applicant applied for a loan. Hence we can create a new data frame grouped by the provinces by using the functions group_by() and summarise().
The group_by(.data,...) function from the dplyr package takes an existing data frame and converts it into a grouped data frame where operations will be performed per group. The grouped data is often used with the function summarise() from the same package. summarise() applies an operations per each group.
For further information check: https://dplyr.tidyverse.org/reference/group_by.html and https://dplyr.tidyverse.org/reference/summarise.html
1.6) Group the data frame borrowed by the province the applicant applied in using the function group_by(). The new data frame should be named provinces. Then use the function summarise() to apply the given operations for each group. Afterwards show the new data frame provinces.
# ___ <- ___ %>% ___ %>% ___("Average Interest Rate" = round(mean(offer4, na.rm=TRUE),3),
# "Average Offer Rate" = round(mean(final4, na.rm=TRUE),3),
# "Average Dynamic Repayment Incentive" = round(mean(yearlong, na.rm=TRUE),3),
# "Average Loansize" = round(mean(loansize, na.rm=TRUE),3),
# "Average Maturity" = round(mean(term, na.rm=TRUE),3),
# "Number of Clients" = sum(tookup)
# )
#
# # show the data frame
#
provinces <- borrowed %>% group_by(province) %>% summarise("Average Interest Rate" = round(mean(offer4, na.rm=TRUE),3),
"Average Offer Rate" = round(mean(final4, na.rm=TRUE),3),
"Average Dynamic Repayment Incentive" = round(mean(yearlong, na.rm=TRUE),3),
"Average Loansize" = round(mean(loansize, na.rm=TRUE),3),
"Average Maturity" = round(mean(term, na.rm=TRUE),3),
"Number of Clients" = sum(tookup)
)
## `summarise()` ungrouping output (override with `.groups` argument)
provinces
## province Average.Interest.Rate Average.Offer.Rate
## 1 Eastern Cape 7.582 7.059
## 2 Free State 7.374 6.677
## 3 Gauteng 7.463 6.915
## 4 Kwazulu-Natal 7.183 6.382
## 5 Limpopo Province 7.350 6.678
## 6 Mpumalanga 7.948 7.458
## 7 North West 10.490 6.750
## 8 Western Cape 7.610 6.988
## Average.Dynamic.Repayment.Incentive Average.Loansize Average.Maturity
## 1 0.489 1511.607 4.964
## 2 0.688 959.375 3.906
## 3 0.506 1512.028 4.950
## 4 0.416 1375.343 4.748
## 5 0.553 1597.970 5.244
## 6 0.545 1447.727 5.409
## 7 0.000 1000.000 4.000
## 8 0.538 1313.034 4.333
## Number.of.Clients
## 1 280
## 2 64
## 3 1268
## 4 1821
## 5 197
## 6 22
## 7 1
## 8 234
As you can see our newly created data frame is grouped into eight provinces. Tough South Africa is actually divided into nine provinces, the province Northern Cape is missing because no client applied for a loan in this province. We want to create an interactive map of South Africa later in this problem set, therefore we have to add the province Northern Cape to our data frame provinces.
1.7 a) Just press Check to add the Northern Cape to the data frame.
provinces <- rbind(provinces, c("Northern Cape", NA, NA, NA, NA, NA, NA))
provinces$province <- c("Eastern Cape", "Free State", "Gauteng", "KwaZulu-Natal", "Limpopo", "Mpumalanga", "North West", "Western Cape", "Northern Cape")
names(provinces)[names(provinces) == "province"] <- "NAME_1"
As you can see the provinces of South Africa are: Eastern Cape, Free State, Gauteng, KwaZulu-Natal, Limpopo, Mpumalanga, Northern Cape, North West and Western Cape.
Before we continue with our analysis, try to answer the short question:
Quiz: Why did so many clients applied for a loan in Gauteng and none in the Northern Cape?
The Northern Cape is a highly urbanized province and Gauteng consists mostly of sedimentary rocks. [ ]
Gauteng is a highly urbanized province and the Northern Cape consists mostly of sedimentary rocks. [x]
The provinces of South Africa vary substantial in size. The smallest but most crowded one is Gauteng. The province of Gauteng is a highly urbanized region. It includes the cities of Johannesburg, Ekurhuleni (East Rand) and Pretoria. Which are three of the five largest cities in the country. On the other hand the largest but most sparsely populated province is the Northern Cape. For comparison the province is slightly larger than Germany (Government of South Africa: “South Africa’s Provinces”, at: https://www.gov.za/about-sa/south-africas-provinces, retrieved 20 January 2021). To get a better idea of what we are talking about, let us import the data with the population of every province. In order to do this we can add a new variable or rather column to the data frame called population. A helpful function to add a variable to a data frame is called mutate.
The mutate() function from the dplyr package is a function for creating new variables. The mutate function is typically divided into three parts. First the data frame you want to modify, second the name of the variable you want to create and lastly the value you want to assign to the new variable -> mutate(dataframe, new_variable = new_values)
For further information check: https://dplyr.tidyverse.org/reference/mutate.html
1.8) Add a new column to our existing data frame provinces which includes the population of each province using the mutatefunction. You can use the given vector c(6734001, 2928903, 15488137, 11531628, 5852553, 4679786, 4108816, 7005741, 1292786) as the values we want to assign to the new variable population. If you finished the task, press Check.
# # Replace the question marks with your answer
# provinces <- ???(??? = ???)
provinces <- mutate(provinces, "Population" = c(6734001, 2928903, 15488137, 11531628, 5852553, 4679786, 4108816, 7005741, 1292786))
The population data is provided by the South African government. Check out https://www.gov.za/about-sa/south-africas-provinces# for more information.
1.9) Press Check in order to get a interactive map of South Africa. The color portrays the average interest rate and the bubble size the population of the selected province. You ca get information about a province by just clicking on it.
library(sf)
library(ggplot2)
library(tmap)
library(tmaptools)
library(leaflet)
library(dplyr)
# set options so numbers don't display as scientific notation
options(scipen=999)
# reads the features from the shapefile
mymap <- st_read("~/Documents/GitHub/thesis_code_rep/gadm36_ZAF_shp/gadm36_ZAF_1.shp", stringsAsFactors=FALSE)
## Reading layer `gadm36_ZAF_1' from data source `/Users/finndeike/Documents/GitHub/thesis_code_rep/gadm36_ZAF_shp/gadm36_ZAF_1.shp' using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: 16.45189 ymin: -34.83514 xmax: 32.89125 ymax: -22.12503
## geographic CRS: WGS 84
# joining the data of the shapefile and our dataframe
map_data <- inner_join(mymap, provinces)
## Joining, by = "NAME_1"
map_data$`Average Interest Rate` <- as.numeric(map_data$`Average Interest Rate`)
map_data$`Average Loansize` <- as.numeric(map_data$`Average Loansize`)
mymap <- tm_shape(map_data) +
tm_polygons("Average Interest Rate",
id = "NAME_1",
style = "quantile",
palette="Greens",
popup.vars=c("Population", "Number of Clients", "Average Interest Rate", "Average Offer Rate", "Average Dynamic Repayment Incentive","Average Loansize", "Average Maturity")) +
tm_bubbles(size = "Population", col= "black", id="NAME_1", scale = 2) +
tm_borders() +
tm_scale_bar()
tmap_leaflet(mymap)
## Warning: One tm layer group has duplicated layer types, which are omitted. To
## draw multiple layers of the same type, use multiple layer groups (i.e. specify
## tm_shape prior to each of them).
## Legend for symbol sizes not available in view mode.
# tmap mode set to interactive leaflet map
Great, you created a map of South Africa which shows the most important average characteristics of the loan contracts for each province! Maybe you now get a glimpse how Bartholomeu Dias felt when he was the first European who explored the coastline of South Africa!
The map gives us information about how the clients are geographically distributed. We can see that a majority of the clients are from the Gauteng and KwaZulu-Natal provinces which are also the provinces with the most residents which is portrayed by the size of the black bubbles. As mentioned earlier we have no clients in the Northern Cape and only 1 in the North West. If we take a look at the average interest rates, which is portrayed by color, we see that it varies from 7.18 in KwaZulu-Natal to 10.49 in North West. However, the average interest rate in North West is not very informative, because as we know, there is only one client in North West. The average interest rates of the remaining provinces are very similar. For further information, just click on the province you want to get more information about.
In the last section we loaded in the data file, got a rough look at our data and created a interactive map of South Africa. Now, we want to find out more about the experiment. In this part details we will discuss the experimental design, implementation and the validation of the random assignments. We will take a look especially at the earlier utilized interest rate offer4.
Let’s start with the application process. The clients with good repayment histories received a limited-time offer including a randomized interest rate from our lender. Those who were eligible for maturities longer then four month also received a randomized sample maturity of four, six or twelve month. The experiment was carried out in three mailer waves of start dates grouped by different branches geographically. First a pilot test in three branches in July 2003, and then the experiment was expended to the remaining 83 branches divided into two more mailer waves in September 2003 and October 2003. The offer rate randomization was based on the client’s pre-approved risk category. The categories for a standard risk schedule for four-month loans were subdivided into low-risk (7.75% PM), medium-risk (9.75% PM) and high-risk (11.75% PM). Through the randomization program the individual rate for each client was randomly assigned based on the distribution for each category. Each offer contained a deadline between two to six weeks. An offer was accepted by a client by entering a branch office and filling out the application with an loan officer. The loan applications were evaluated per the lenders standard procedure independent of the experimental rates. Following the estimation of the loan officer the clients were assigned a proportional loan size and maturity. The maturity randomization was orthogonal to the offer rate randomization. But the difference was that only low- and medium-risk clients received the suggestion randomization . High-risk clients were not able to choose higher maturities than 4 months. The only value which was clearly stated on the letter and was not randomized was the loan size. The following graphic shows the operational steps of the experiment in detail:
Fig. 1 | Operational Steps of Experiment. (Karlan, Dean S. and Zinman, Jonathan, “Credit Elasticities in Less-Developed Economies: Implications for Microfinance” (2008))
Fig 1. Source: Karlan, Dean S. and Zinman, Jonathan, “Credit Elasticities in Less-Developed Economies: Implications for Microfinance” (2008)
2.1.1) To load in the data just press Check.
stata_data <- read_dta("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
stata_data <- stata_data %>% mutate(itcscore_100 = itcscore/100, appscore_100 = appscore/100)
We want to find out more about the interest rate variable offer4. In this part of the section we will inspect the correlation of offer4 and maturity, conditional on the risk category, with other observable variables and check if the randomization process was successful.
An interesting function to look at the distribution of rates is geom_density(). The function is part of the package ggplot2. The package is used for creating elegant data visualizations and is based on “The Grammar of Graphics”. geom_density() is often used in combination with the function ggplot() which is also a part of the ggplot2package. We want to create a density estimate of the offer rate in relation to the three risk categories.
The ggplot() function from the ggplot2 package is a simple way to convert specific data in a data frame into plots using the data argument. The aesthetics aes function in the template defines an aesthetic mapping, by selecting the specific variables to be plotted and specifying how to present them in the graph, e.g. the positions of the x- and y-values or other characteristics such as color, size or shape. The geoms- functions define the graphical representation of the plotted data, e.g. geom_ponit for scatter plots, geom_line() for trend lines, geom_boxplot() for boxplots or geom_density() kernel density estimate, which is a smoothed version of the histogram. To add an geom- to the template use the + operator. A basic template of a ggplot() looks as the following: ggplot(data = .data, mapping = aes(
For further information check: https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html
2.1.2) To create a density estimate of the offer rates, just press the Check button.
stata_data %>%
ggplot(aes(x=offer4, group=risk, fill=risk)) +
geom_density(adjust = 1.5, alpha = 0.4) +
scale_fill_discrete(name = "Risk Category", breaks = c("LOW", "MEDIUM", "HIGH")) +
xlab("Randomized Offer Rate (%)") +
ylab("Density") +
geom_vline(xintercept= c(7.75, 9.75, 11.75), col = c("green", "blue", "red"))
The graphic shows the distribution of the offer rates of the clients in relation to their risk category. We can see that the interest rates of all clients are mostly under their categorical standard schedule. The lowest randomized offer rate is slightly higher then 3% per month and the highest is slightly lower then 15% per month. If we consider the different risk categories it becomes apparent that a offer rate of a low risk client has the highest density between slightly less than 6% and slightly more then 7% at around 0.3. If we now have a look at clients of the medium risk category we see that the offer rate has the highest density between 7% and a bit more then 9% having a ratio of about 0.22. Whereas the density of high risk category is the highest at about 7.5% and between 9% and 11% at 0.16. These determinations are vague estimates of the graphic. But we can have a more detailed look at the distribution by using simple mathematical operations.
There are a variety of mathematical operators which can be used in R with the basic tamplate , e.g. mean(.data) to calculate the average value of a data sequence, min(.data) to determine the minimal value or max(.data) to determine the maximum value of a data sequence, sum(.data) to calculate the sum of a data area, nrow(.data) to determine the length of a data vector or (.data)^n to potentiate a value by n.
2.1.3) Calculate the minimal (min_value) and maximum (max_value) values of the offer rate in regarding the risk categories by using the group_by() function and mathematic operations. To do so create a new data frame values. If you finished the task press Check.
# create a data frame `values` which contains the min and max interest rate values of each risk category and show the output aftwerwards
values <- stata_data %>% group_by(risk) %>% summarise(min_value=min(offer4), max_value=max(offer4))
## `summarise()` ungrouping output (override with `.groups` argument)
values
## risk min_value max_value
## 1 HIGH 3.25 14.75
## 2 LOW 3.25 11.75
## 3 MEDIUM 3.25 13.75
The data frame shows that the offer rates very from 3.25% to 14.75%.
Now we want to analyze how large the proportion of interest rate lower or rather higher the the standard rates is. The standard rate is the rate the client would be charged with if the experiment did not take place. Fortunately we have the variables called normate_less() and normrate_more() which states if the offer rate is even or lower or rather higher than the standard rate (1=YES, 0=NO). Therefore we can easily calculate the proportion of interest rates which are even or lower and higher than the standard rates. (ADD AVERAGE DISCOUNT)
2.1.4) Calculate the proportion of interest rates even or lower and higher than the standard rates. Check the info box Mathemtic Operations for help.
#
sum(stata_data$normrate_less)/nrow(stata_data)
## [1] 0.988255
sum(stata_data$normrate_more)/nrow(stata_data)
## [1] 0.01174503
We can see that slightly more than 1 percent of the offers were higher than the standard rate and nearly 99 percent at a lower or even rate.
Early it was mentioned that the assigned rates are uncorrelated with other given information such as the external or internal credit score. Thus let us check if this assumption corresponds with the reality. We will now do a simple linear regression with the lm() function and check if the offer rate is actually unrelated to other observable characteristics. The value we will look at the closest is the p-value.
The basic R function lm() is used to fit linear models and determines whether or not there is a relationship between a dependent variable \(y\) and the independent variables \(x_1, x_2,...,x_n\). The R-command summary() creates a nice summary statistic output of the linear regression. Here is a basic template of an lm() regression and an summary() output:
reg <- lm(y ~ x1 + x2, data=data)
summary(reg)
For further information check: https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/lm and https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/summary
The p-value will give us information about the outcome of the randomization. If our hypothesis is correct, the p-value will be comparatively high for each variable and higher than the significance level.
The p-value is the level of marginal significance within a statistical hypothesis test, representing the probability of the occurrence of a given event or rather a measure of the probability that an observed difference could have occurred just by random chance. The value always lies between 1 and zero. The lower the p-value is, the stronger is the evidence that you should reject the null hypothesis. A p-value of 0.05 is generally speaking statistically significant. Moreover, the significance level is portrayed by little stars.
Check out https://www.investopedia.com/terms/p/p-value.asp for further information.
We will also add control variables of the month of the offer(waved2, waved3) and the lender-defined risk level of the client prior to the experiment (low,med) and further characteristics of the client, to avoid endogeniety problems. Control variables are variables which are hold constant during an experiment. They are neither dependent or independent variables. Control variables are not part of the experiment itself but yet they still may influence the outcome of the experiment.
2.2.1) Task: Regress the offer rate offer4 on the given variables using the lm() regression and store it in the variable reg2_1. To do so, replace the question marks of the given code with you answer.
# reg2_1 <- ???(??? ~ dormancy + lntrcount + female + dependants + married + lnage + rural + edhi + itcscore_100 + itczero + appscore_100 + low + med + waved2 + waved3, data=stata_data)
#
# summary(???)
reg2_1 <- lm(offer4 ~ dormancy + lntrcount + female + dependants + married + lnage + rural + edhi + itcscore_100 + itczero + appscore_100 + low + med + waved2 + waved3, data=stata_data)
summary(reg2_1)
##
## Call:
## lm(formula = offer4 ~ dormancy + lntrcount + female + dependants +
## married + lnage + rural + edhi + itcscore_100 + itczero +
## appscore_100 + low + med + waved2 + waved3, data = stata_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.484 -1.423 0.461 1.844 6.167
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.650802 0.171447 50.458 < 0.0000000000000002 ***
## dormancy 0.001390 0.002005 0.693 0.488
## lntrcount 0.003727 0.013494 0.276 0.782
## female 0.023754 0.022137 1.073 0.283
## dependants 0.000203 0.006758 0.030 0.976
## married 0.016524 0.022839 0.723 0.469
## lnage -0.002388 0.048166 -0.050 0.960
## rural 0.019894 0.029366 0.677 0.498
## edhi -0.012782 0.022172 -0.577 0.564
## itcscore_100 0.005023 0.014248 0.353 0.724
## itczero 0.035417 0.096662 0.366 0.714
## appscore_100 -0.064632 0.135196 -0.478 0.633
## low -2.486245 0.038698 -64.247 < 0.0000000000000002 ***
## med -1.075115 0.041200 -26.095 < 0.0000000000000002 ***
## waved2 -0.291439 0.039424 -7.392 0.00000000000014625 ***
## waved3 -0.296611 0.038030 -7.799 0.00000000000000633 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.327 on 53538 degrees of freedom
## (256 observations deleted due to missingness)
## Multiple R-squared: 0.1121, Adjusted R-squared: 0.1119
## F-statistic: 450.7 on 15 and 53538 DF, p-value: < 0.00000000000000022
In fact the p-value of all observable variables is significantly higher than the significance level. Which indicates that indeed the offer rate is highly likely independent of the other observable characteristics and thus the randomization process was successful.
We can also portray the changes that different explanatory variables have on the dependent variable in a regression model with the function effectplot() of the regtools package. The basic concept of an effect plot is to compare the effects of the explanatory variables if they change from their 10 percent to their 90 percent quantile or for binary variable the effect of changing from 1 to 0. To create an effectplot, we just have to add our regression into the function braces effectplot(.reg) (For further information checkout: https://github.com/skranz/regtools/blob/master/man/effectplot.Rd).
2.2.2) Create an effectplot() of our regression reg2_1. Afterwards, press the check button.
# Create an effectplot using the effectplot() function of the regression reg2_1
effectplot(reg2_1)
## Warning: Ignoring unknown aesthetics: ymax, ymin
The effectplot confirms our assumption, that the randomization process was successful. The only explanatory variables which influence our response variable significantly are the controll variables of the month of the offer and the risk category of the client.
Since we just learned, that it is reasonable to assume that the randomization process was successful, we will no check if the offer rate below or at the standard rates did influence the clients who borrowed after the given deadline. For this we will use a probit regression which is very similar to simple linear regression. The difference is that a probit regression is a binomial regression which means the outcome is either a success (1) or a failure (0). Where in linear regressions the outcome is scale or rather numerical.
The glm()function from the stats-package is used to fit generalized linear models. It is similar structure to the lm()regression, with the exception that with the argument family we can specify what kind of linear model we want to use. In our case we want to use a Probit link function. Then the function looks as follows: glm(
Check out https://are.berkeley.edu/courses/EEP118/fall2010/section/13/Section%2013%20Handout%20Solved.pdf for further information.
The theoretical model of a probit regression will be explained more in detail in exercise 3.
2.2.3) Use a generalized regression model to apply a probit regression on the tookup_afterdead_enforced (take-up after the deadline) with the offer rate. To Change the code from a simple linear regression to a generalized linear probit regression. If you need help press hint.
# reg2_2 <- lm(tookup_afterdead_enforced ~ offer4 + low + med + waved2 + waved3, data=stata_data)
# summary(reg2_2)
reg2_2 <- glm(tookup_afterdead_enforced ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=stata_data)
summary(reg2_2)
##
## Call:
## glm(formula = tookup_afterdead_enforced ~ offer4 + low + med +
## waved2 + waved3, family = binomial(link = "probit"), data = stata_data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0030 -0.4755 -0.4742 -0.4665 2.1327
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.9695075 0.0333609 -29.061 <0.0000000000000002 ***
## offer4 -0.0005594 0.0029844 -0.187 0.851
## low 0.7057192 0.0198982 35.467 <0.0000000000000002 ***
## med 0.5628594 0.0213071 26.416 <0.0000000000000002 ***
## waved2 -0.2892394 0.0230624 -12.542 <0.0000000000000002 ***
## waved3 -0.2709625 0.0222376 -12.185 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 44956 on 53809 degrees of freedom
## Residual deviance: 42776 on 53804 degrees of freedom
## AIC: 42788
##
## Number of Fisher Scoring iterations: 4
As we can see the p-value is 0.851 which is considerably higher than the significance level which substantiates that offer rates at or below the standard rate did not influence the take-up after the deadline. This seems conclusive since the clients borrowed at the standard rate schedule after the deadline.
We can also observe the relationship between the take-up after the deadline and the offer rate from a graphical point of view. Therefore we will use the ggplot() function again to declare the input data frame, then we will utilize the stat_smooth() (will be explained later in detail) to plot our probit regression reg2_2.
2.2.4) Plot the probit regression reg2_2. Just press Check.
ggplot(stata_data,aes(x=offer4,y=tookup_afterdead_enforced))+
stat_smooth(method='glm',family=binomial(link='probit'))+
ylim(min=0, max=1)
## Warning: Ignoring unknown parameters: family
## `geom_smooth()` using formula 'y ~ x'
The graph corroborates our thesis since there is no clear tendency that the probability of a take-up after the deadline increases or decreases with a higher offer rate.
In addition we want to find out if the rejection decisions of the clients were correlated with the offer rate awarding process. The approach is the same as the previous regression. However this time we have to reduce the data frame to only clients who applied for a loan. Furthermore we will add a new function called stargazer() to get a nicer output table which holds the regression results of all three regressions.
stargazer is an R package that creates LATEX-/HTML code and ASCII text for regression tables, with multiple models side-by-side, as well as for summary statistics tables, data frames, vectors and matrices. We use stargazer because the output is well-formatted, it supports many models and it commands many beautiful aesthetics. To create summary statistics table we just have to run stargazer(.data). If we want to create a regression table we have to proceed in the same way but we want to make some adjustments. With the command type= we can determine the output data type. The header= command indicates whether a header (name and version of the package, author,…) should appear.
For further information check: https://cran.r-project.org/web/packages/stargazer/stargazer.pdf
We want to set type="html" and header=FALSE.
2.2.5) Execute the Regression reg2_3 then use the function stargazer() to create an output table of all three regressions: 2_1; 2_2; 2_3. If you need a little hint, press hint.
# reg2_3 <- glm(rejected ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(stata_data, applied == 1))
#
# stargazer(???)
reg2_3 <- glm(rejected ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(stata_data, applied == 1))
stargazer(reg2_1, reg2_2, reg2_3, type="html", header=FALSE)
| Dependent variable: | |||
| offer4 | tookup_afterdead_enforced | rejected | |
| OLS | probit | probit | |
| (1) | (2) | (3) | |
| dormancy | 0.001 | ||
| (0.002) | |||
| lntrcount | 0.004 | ||
| (0.013) | |||
| female | 0.024 | ||
| (0.022) | |||
| dependants | 0.0002 | ||
| (0.007) | |||
| married | 0.017 | ||
| (0.023) | |||
| lnage | -0.002 | ||
| (0.048) | |||
| rural | 0.020 | ||
| (0.029) | |||
| edhi | -0.013 | ||
| (0.022) | |||
| itcscore_100 | 0.005 | ||
| (0.014) | |||
| itczero | 0.035 | ||
| (0.097) | |||
| appscore_100 | -0.065 | ||
| (0.135) | |||
| offer4 | -0.001 | 0.009 | |
| (0.003) | (0.011) | ||
| low | -2.486*** | 0.706*** | -0.626*** |
| (0.039) | (0.020) | (0.069) | |
| med | -1.075*** | 0.563*** | -0.285*** |
| (0.041) | (0.021) | (0.065) | |
| waved2 | -0.291*** | -0.289*** | -0.170** |
| (0.039) | (0.023) | (0.072) | |
| waved3 | -0.297*** | -0.271*** | -0.597*** |
| (0.038) | (0.022) | (0.073) | |
| Constant | 8.651*** | -0.970*** | -0.617*** |
| (0.171) | (0.033) | (0.111) | |
| Observations | 53,554 | 53,810 | 4,540 |
| R2 | 0.112 | ||
| Adjusted R2 | 0.112 | ||
| Log Likelihood | -21,387.990 | -1,777.595 | |
| Akaike Inf. Crit. | 42,787.980 | 3,567.191 | |
| Residual Std. Error | 2.327 (df = 53538) | ||
| F Statistic | 450.744*** (df = 15; 53538) | ||
| Note: | p<0.1; p<0.05; p<0.01 | ||
The first column shows the result of our first regression reg2_1, the second column the result of our second regression reg2_2 and the third our third regression reg2_3. We see that the p-value is once again substantially higher than the significance level which corroborates that the rejection decision was not influenced by the offer rate.
To sum up everything that has been stated so far we found out that our randomization process is was successful not affected by other observable characteristics.
You successfully completed the first part of our regression analysis! You are on the way to become the next Adrien-Marie Legendre who was the first person to discover the method of least squares!
It the first two sections we got familiar with our parameters and verified the randomization process. Now we want to comprehend the empirical strategy.
So far we learned a lot about our data and the randomization process thus in this section we will apply our newly acquired knowledge to map our data into testable predictions. In the narrow sense we are interested in the response of loan demand to changes in price and maturity. Our basic model for the estimation is the following: \[y_i = f(C_i, X_i)\] In our model \(y\) measures the extensive (take-up) and intensive (loansize) demand. While \(i\) indicates one of the 53,810 borrowers. The variable \(C_i\) is a vector including the offer rate (\(r_i\)) and the maturity (\(m_i\)). The variables we used for the randomization process of \(r_i\) - the pre-approved risk category (low/medium/high) and the mailer wave (July/September/October) - are included in \(X_i\).
We often use binary variables as independent variables in regressions. But in our case we want to use a binary variable as a dependent variable. This means it is either 1 if something occurs or zero otherwise. It is possible to use the ordinary least squares method (OLS) in which the dependent variable \(y\) is binary. It is called the Linear Probability Model (LPM). The LPM is an OLS method with a continuous dependent variable: \[Y_i = \beta_0 + \beta_1{X_1} + ... + \beta_k{X_{ki}} + \epsilon_i\] To analyze the model we take a look at the conditional expectation of the dependent variable \(Y\) and we see that: \[E[Y|X] = P(Y=1 | X)\] Now we can use this assumption to describe the model above, we assume that: \[E[Y|X] = \beta_0 + \beta_1{X_1} + \epsilon_i = P(Y=1 | X) = y\] Since the expectation of the error term \(\epsilon_i\) given we have \(X\) is 0. The change in probability associated with a change in X to \(X+1\) equals a probability change by the factor of \(\beta_1\). In other words the coefficient \(\beta_i\) can be interpreted as the change in \(Y\) associated with a unit change in \(X_i\) or the predicted probability of having \(y=1\) for the given values of \(x_1...x_k\).
However the classic LPM has a fundamental problem. We will shows this by using the mtcars data set which is included in the dplyr-package. The following plots will show the difference between a LPM and a Probit or Logit model and why probit or logit models are better suited future regressions. Just press check to have a look at the following plots to answer the question bellow:
lm_plot <- ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() +
stat_smooth(method="lm", se=TRUE) +
geom_hline(yintercept=0, col="red") +
geom_hline(yintercept=1, col="red") +
ylim(-0.25, 1.5) +
ggtitle("Linear Probability Model")
probit_plot <- ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() +
stat_smooth(method="glm", method.args=list(family="binomial"(link="probit")), se=TRUE) +
geom_hline(yintercept=0, col="red") +
geom_hline(yintercept=1, col="red") +
ylim(-0.25, 1.5) +
ggtitle("Probit Model")
logit_plot <- ggplot(mtcars, aes(x=mpg, y=vs)) + geom_point() +
stat_smooth(method="glm", method.args=list(family="binomial"(link="logit")), se=TRUE) +
geom_hline(yintercept=0, col="red") +
geom_hline(yintercept=1, col="red") +
ylim(-0.25, 1.5) +
ggtitle("Logit Model")
grid.arrange(lm_plot, probit_plot, logit_plot, ncol=2)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Quiz: What could be the problem with the Linear Probability Model?
It is not possible to use the LPM if the dependent variable is binary. [ ]
In an LPM it is possible to get a probability below zero or above 1. [x]
There is no problem with the LPM. [ ]
As you see in the plots above, Probit and Logit models are better fitted for a regression with a binary dependent variable. They are in fact specifically made for regression with a binary dependent variable and always results in a probability between zero and 1. Now we have to choose between the probit and logit model. The real big difference between the logit and probit model is the assumptions made about the error distributions. Logit assumes you have a logistic error distribution while probit assumes you have a normal error distribution. Since their is a lot more known about the normal error distribution, we will use the probit model in our future regressions with a binary dependent variable. Now we are able to use this to analyze the response of loan demand to changes in price and maturity.
A difficulty with estimating loan demand elasticities is that the contract terms are often subjected to external influences, such as alternative financing opportunities or other supply decisions. As far as the price sensitivity we approached the problem be randomizing the interest rate based on the clients risk category. This allows us to observe what happen if we change the loan price or in our instance the interest rate. To achieve this, we estimate a probit model of the form:
\[ a_i = \alpha + \beta{r_i} + \delta{X_i} + \epsilon_{ib} \]
In our model \(a_i\) is the independent variable applied which can be either 1 if the client \(i\) applied for a loan or 0 if he or she did not. The offer rate offer4 \(r\) is orthogonal to the standard errors \(\epsilon_{ib}\) by construction and therefore \(\beta\) is an unbiased estimate of the price sensitivity of loan take-up from direct mail offers. We will assume that \(\beta < 0\) since almost every model of consumer choice predicts that the demand is downward sloping with an increase in price.
Wow, you now understand the Probit model. It’s basic model is based on the Weber-Fechner law formulated by Gustav Fechner published in Fechner (1860). He was a German experimental psychologist, philosopher and physicist and he is said to be the founder of psychophysics.
Since you now learned a little bit about our theoretical model, let’s do a short Quiz based on our data:
Quiz: What do you think would happen to the take-up if we would increase the interest rate by 100 basis-points? Just type “reduces” or “increases” in the blank box.
Answer: reduces
In the next section we will deal with this subject more intensively.
So war far we talked a lot about our data set and our theoretical strategy. But in this section we want to get our first tangible results. In fact in the first part of exercise 4, we will be estimating the price elasticity of loan demand.
4.1.1) Since this is a new exercise, press Checkto load in the data once again.
stata_data <- read_dta("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
As we mentioned in the last section we are using a probit model to estimate the price elasticity of loan demand. We begin with borrowers who applied for a loan before the deadline ended. Tough instead of the lm()-function, we will use the generalized linear model function glm() because it enables us to perform a probit regression.
We want to start with the clients who received offers at or below the standard rate of their risk category. As a reminder, the offer rate per month for a low risk client was 7.75 percent, for a medium risk client 9.75 percent and for a client of the high risk category 11.75 percent. To do so we have to reduce the data set to only the clients who received a offer at or below the standard rates of the client (normrate_less == 1) with the filter() function. Recall that applied is our binary dependent variable and offer4 as well as the control variables low, med, waved2and waved3 are our explanatory variables.
4.1.2) Use glm() to perform our first Probit regression. Regress applied on offer4, low, med, waved2 and waved3 and filter our data set stata_data for normate_less == 1. Store this regression in reg3_1. Afterwards display the regression results with the summary() command. Finally press Check or if you need help press hint.
# Perform a Probit regression and show the regression results
reg3_1 <- glm(applied ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(stata_data, normrate_less == 1))
summary(reg3_1)
##
## Call:
## glm(formula = applied ~ offer4 + low + med + waved2 + waved3,
## family = binomial(link = "probit"), data = filter(stata_data,
## normrate_less == 1))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.6919 -0.3799 -0.3514 -0.3337 2.4444
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.32864 0.03929 -33.815 < 0.0000000000000002 ***
## offer4 -0.01998 0.00361 -5.536 0.0000000309 ***
## low 0.57798 0.02270 25.461 < 0.0000000000000002 ***
## med 0.59713 0.02369 25.210 < 0.0000000000000002 ***
## waved2 -0.04692 0.02892 -1.622 0.10472
## waved3 -0.07745 0.02820 -2.746 0.00603 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 30825 on 53177 degrees of freedom
## Residual deviance: 29453 on 53172 degrees of freedom
## AIC: 29465
##
## Number of Fisher Scoring iterations: 5
From our computed summary we can see that the offer rate offer4 is significant at the 0.1 percent level which means that a coincidental connection between applied and offer4 is very unlikely. Unfortunately an Probit output is not equal to the marginal effects. Though we can say that an increase of the offer rate by 100-basis-point is negatively associated with the loan take-up. Suggesting that clients who received an offer at or below the standard rates are less likely to apply if the offer rate would be 100-basis-points higher.
Next, we want to estimate price elasticity of loan demand of clients who received an offer higher its standard ones and if the price sensitivity changed when the lender offered rates higher then the standard rates. To estimate the price sensitivity of clients who were offered a rate higher then the standard for their risk category, we can use the same regression as we did in task 4.2.1), although we must change the variable normrate_lessto zero. To analyze if the price sensitivity changed when the lender offered higher then the clients standard rate, we have to regress applied on normrate_more. This time, however, we will estimate the probit regression model and calculate the corresponding marginal effects of the two regressions and the regression reg3_1 from task 4.2.1). To do this we are using functions from the regtools package. In particular the showreg() function which can show marginal effects in a glm model and it allows for robust standard errors (https://rdrr.io/github/skranz/regtools/src/R/showreg.r).
The showreg()function from the regtools-package is used to extend and wrap either stargazer or the screenreg, texreg and htmlreg functions in the texreg package. It allows for robust standard errors and can show marginal effects in glm models. To show the marginal effects of a Probit regression it uses the argument coef.transform="mfx" from the mfx-package.
4.1.3) Just press Check to perform the two above explained regressions and show the marginal probit effects of all three regressions.
reg3_2 <- glm(applied ~ offer4 + low + med, family = binomial(link = "probit"), data=filter(stata_data, normrate_less == 0))
reg3_3 <- glm(applied ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data=stata_data)
showreg(list("(1)"=reg3_1,"(2)"=reg3_2, "(3)"=reg3_3), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)", digits=3)
## Version: 1.37.5
## Date: 2020-06-17
## Author: Philip Leifeld (University of Essex)
##
## Consider submitting praise using the praise or praise_interactive functions.
## Please cite the JSS article in your publications -- see citation("texreg").
##
## Attaching package: 'texreg'
## The following object is masked from 'package:tidyr':
##
## extract
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: betareg
========================================================== (1) (2) (3)
———————————————————- offer4 -0.003 *** -0.017 *
(0.001) (0.009)
low 0.112 *** 0.017 0.124 (0.006) (0.043) (0.005)
med 0.119 0.024 0.124 (0.006) (0.038) (0.006)
waved2 -0.007 -0.008
(0.004) (0.004)
waved3 -0.011 -0.012 (0.004) (0.004)
normrate_more -0.030 (0.008)
———————————————————- AIC 29464.760 299.792 29791.929
BIC 29518.049 317.587 29845.288
Log Likelihood -14726.380 -145.896 -14889.965
Deviance 29452.760 291.792 29779.929
Num. obs. 53178 632 53810
========================================================== *** p < 0.001; ** p < 0.01; * p < 0.05
Column (1) presents the probit marginal effect of clients who received offers at or below the standard rate of their risk category. A 100-basis-point increase in the monthly interest rate can be associated with reduced take-up by 0.3 percentage points. This seems to be a very small effect since we know from exercise 2 that the price ranges between 3.25% and 11.75%. This means that price decrease from the maximum to the minimum would increase the take-up by only 2.6 percentage points ((11.75%-3.25%)*-0.003 = 0.026). In column (2) we can see that a 100-basis-point increase in the monthly interest rate has a higher effect on clients who received offers higher the standard rate of their risk category. In this case the price increase results in a 1.7 percentage points lower take-up. This means that the effect or rather the price sensitivity is nearly six times higher for clients who received offers above their risk category. If we now inspect column (3) is becomes apparent that higher interest rates are associated with a depressed level of take-up. In fact clients are 3 percentage points less likely to apply for a loan.
We can illustrate the demand price sensitivity in graphically. For this we can use the ggplot() function of the ggplot2 package we used in task 2.1.2). To generate the demand curve we need the x-axis to be the residual of a regression of the monthly interest rate on the conditions of the experiment, and the y-axis to be the residuals of a regression of take-up on the experimental conditions. One possibility would be to show the residuals in a scatter plot by using the function geom_point() also of the ggplot2 package.
4.1.4) Press Check to generate a scatter plot of the demand price sensitivity.
x_plot1 <- lm(offer4 ~ low + med + waved2 + waved3, data=stata_data)
y_plot1 <- lm(tookup ~ low + med + waved2 + waved3, data=stata_data)
ggplot(stata_data, aes(x=x_plot1$residuals, y=y_plot1$residuals)) + geom_point()
As you can see, it can be hard to identify a trend with just points alone. Therefore, we use another geom- function called geom_smooth(), to replace the scatter plot with a smoothing line. We also have to add se=FALSE and method="loess" into the geom_smooth() command. We set se=FALSE because we would get an overflow otherwise, knowing our data set is quiet large.
The geom- function geom_smooth() is part of the ggplot2 package. The function adds a smoothing line to a plot. It enables us to identify trends easier. The function adds confidence bands to the smoothing line as well which can lead to a overflow if the data set is large. We remove these by setting the command se to FALSE to avoid an overflow. Furthermore, we can edit the smoothness of the smooth line with the command span=_. The span can be adjusted from 1 very smooth to 0 very rough. We can also decide which type of smooths we consider by using the method=_ command. We an choose between four types: lm, glm, gam and loess.
For more information check: http://statseducation.com/Introduction-to-R/modules/graphics/smoothing/
4.1.5) Replace the geom_point() function with the geom_smooth() and add the above mentioned conditions. Afterwards press Check.
# ggplot(stata_data, aes(x=x_plot1$residuals, y=y_plot1$residuals)) + geom_point()
ggplot(stata_data, aes(x=x_plot1$residuals, y=y_plot1$residuals)) + geom_smooth(method="loess", se=FALSE, span=0.5)
## `geom_smooth()` using formula 'y ~ x'
The demand curve confirms our thesis that an interest rate decrease is associated with a higher loan size and an increased offer rate with a lower loan size. However, a kink at approximately an interest rate increase of 150-basis-points is particular noticeable. It shows that if we increase the interest rate by more then 150-basis-points the demand curve or rather the loan size falls strongly.
An explanation for the kink in the demand curve could be that clients borrowed elsewhere if the offer rate was to high. To test this hypothesis we will perform the same regression as in 4.2.2 but this time we will only focus on clients who ended up borrowing from another financial institution. For this circumstance the lender obtained credit bureau data and it is described with the variable tookup_outside_only in our data set.
4.1.5) Press Check to perform the regression of clients who borrowed from other financial institutions on the offer rate. Afterwards use the regression results to answer the question below.
reg3_4 <- glm(tookup_outside_only ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data = filter(stata_data, normrate_less == 1))
reg3_5 <- glm(tookup_outside_only ~ offer4 + low + med, family = binomial(link = "probit"), data = filter(stata_data, normrate_less == 0))
reg3_6 <- glm(tookup_outside_only ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data = stata_data)
showreg(list("(4)"=reg3_4,"(5)"=reg3_5, "(6)"=reg3_6), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)", digits=3)
========================================================= (4) (5) (6)
——————————————————— offer4 0.001 -0.010
(0.001) (0.018)
low 0.028 *** 0.007 0.025 (0.006) (0.079) (0.006)
med -0.005 0.068 -0.005
(0.006) (0.067) (0.006)
waved2 -0.054 -0.053 (0.007) (0.007)
waved3 -0.050 -0.049 (0.007) (0.007)
normrate_more 0.005
(0.017)
——————————————————— AIC 56387.254 755.138 57138.454
BIC 56440.542 772.933 57191.813
Log Likelihood -28187.627 -373.569 -28563.227
Deviance 56375.254 747.138 57126.454
Num. obs. 53178 632 53810
========================================================= p < 0.001; ** p < 0.01; * p < 0.05
Quiz: Did higher offer rates induce more borrowing from other financial institutions?.
A 100-basis-point increase can be associated with a 0.1 higher take-up from an outside borrower for clients at or below the standard rate and a decrease in take-up by -0.1 for clients who received offers higher the standard rate of their risk category. [ ]
There is no significant relationship between higher offer rates and borrowing from an outside lender. [x]
A 100-basis-point increase can be associated with a -0.1 lower take-up from an outside borrower for clients at or below the standard rate and a increase in take-up by 0.1 for clients who received offers higher the standard rate of their risk category.. [ ]
The results in columns (4+5) suggest that there is a positive relationship between a higher offer rate and clients who received an offer rate at or below their standard ones and ended up borrowing from another financial institution. As well as a negative relationship for clients who received a higher offer rate then the standard for their risk category. In addition (6) indicates that a higher offer rate strengthened the take-up from other financial institutions. However the confidence intervals rule out economically large substitution. Therefore we can not say that higher interest rates influenced the borrowing behaviour of clients in respect to borrowing from other financial institutions. Hence, we cannot rule out other financing opportunities like i.e. family, friends or moneylenders.
There is another possible explanation for the kink in the demand curve. Clients could have borrowed after the deadline if their offer rate was higher then the standard rate and then borrow at the lower rate. This is testable by examining the post-deadline borrowing behaviour of the clients. A logical outcome would be that clients with a higher offer rate then the standard rate will borrow after the deadline expired. In order to test this hypothesis we will perform the regression once again, but this time soley with clients who borrowed after the deadline. To find out which variable describes those clients press the data button.
4.1.6) This time you only have to replace the question marks with the variable which indicates if clients borrowed after the deadline. If you think your answer is correct press Check. If you can not find the variable in data just press hint.
# reg3_7 <- glm(??? ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(stata_data, normrate_less == 1))
#
# reg3_8 <- glm(??? ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data = stata_data)
#
# reg3_9 <- glm(??? ~ offer4 + low + med, family = binomial(link = "probit"), data = filter(stata_data, normrate_less == 0))
#
# showreg(list("(7)"=reg3_7,"(8)"=reg3_8, "(9)"=reg3_9), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)", output = "html", digits=3)
reg3_7 <- glm(tookup_afterdead_enforced ~ offer4 + low + med + waved2 + waved3, family = binomial(link = "probit"), data=filter(stata_data, normrate_less == 1))
reg3_8 <- glm(tookup_afterdead_enforced ~ normrate_more + low + med + waved2 + waved3, family = binomial(link = "probit"), data = stata_data)
reg3_9 <- glm(tookup_afterdead_enforced ~ offer4 + low + med, family = binomial(link = "probit"), data = filter(stata_data, normrate_less == 0))
showreg(list("(7)"=reg3_7,"(8)"=reg3_8, "(9)"=reg3_9), coef.transform=c("mfx", "mfx", "mfx"), omit.coef = "(Intercept)", output = "html", digits=3)
| (7) | (8) | (9) | |
|---|---|---|---|
| offer4 | 0.000 | -0.012 | |
| (0.001) | (0.015) | ||
| low | 0.201*** | 0.200*** | 0.177* |
| (0.007) | (0.006) | (0.083) | |
| med | 0.155*** | 0.154*** | 0.155* |
| (0.007) | (0.007) | (0.066) | |
| waved2 | -0.065*** | -0.065*** | |
| (0.005) | (0.005) | ||
| waved3 | -0.065*** | -0.065*** | |
| (0.005) | (0.005) | ||
| normrate_more | -0.036** | ||
| (0.011) | |||
| AIC | 42215.085 | 42779.521 | 571.133 |
| BIC | 42268.373 | 42832.880 | 588.928 |
| Log Likelihood | -21101.543 | -21383.761 | -281.566 |
| Deviance | 42203.085 | 42767.521 | 563.133 |
| Num. obs. | 53178 | 53810 | 632 |
| p < 0.001; p < 0.01; p < 0.05 | |||
In fact, we get a unexpected result - higher offer rates are associated with less post-deadline borrowing. The result of the regression reg3_8 in column (8) indicate that higher offer rates lead to -0.036 percentage points less post-deadline borrowing. Moreover, in column (9) we can see that the likelihood of borrowing decreases by -0.012 percentage points. The result in column (7) does suggest any economically large substitution.
Our pattern of results with respect to timing is consistent with switching costs. The regression results of this exercise show that pre- and post-deadline borrowing decreases in price. Therefore an explanation for our result could be that clients borrowed from other financial institutions pre-deadline. Afterwards the clients found it to cost-intensive to switch back to our lender after the deadline.
To put it in a nutshell, our results on non-linearities in price sensitivity seem to be consistent with explanations of their behaviour and in regards to other financing options in informal markets with switching costs.
In this second section of exercise 4 we want to examine the price sensitivity of loan size. The loan size is expressed in South African Rand (R). For your information: R1.00 corresponds to approximately 0.067 US-Dollar (https://www.xe.com/currencyconverter/convert/?Amount=1&From=ZAR&To=USD, 03/30/2021). Our analysis in this part is dependent on three conditional traits and controls: conditional on borrowing, branch fixed effects and additional controls for demos and credit risk.
4.2.1) Press Checkto load in the data once again.
stata_data <- read_dta("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
In the first part of this exercise we are focusing on clients who were randomly assigned with an equal offer and contract rate as well as below the Lender’s standard rate for each individual risk category. Want to compare the price sensitivity of the amount borrowed, unconditional on borrowing, of all clients who were at our below the standrad rate and non-borrowers. In our first subsample of clients the dependent variable loansize only includes pre-deadline borrowing and we condition just on the risk category (med + low) and the mailer wave (wave2 + wave3). However, this time we will perform a felm() regression of the lfe package. It is similar to lm() which we used in exercise 2. Though, felm() used to fit linear models with group fixed effect. It uses the Method of Alternating projections to sweep out multiple group effects from the normal equations before estimating the remaining coefficients with OLS.
The felm function is a part of the lfe package and is intended to be used with large data sets with multiple group fixed effects. The basic template consists of the dependent variable, a formula of four parts and the data. The fist part consists of the ordinary independent variables \(x_1 + x_2\) and the second part of displayed factors or rather our fixed effects \(f_1 + f_2\). The third part of the IV-specification \((Q|W ~ x_3 + x_4)\) and the fourth part of the cluster specification of the standard errors. Combined it i.e. looks like:
reg <- felm(y ~ x1 + x2 | f1 + f2 | (Q|W ~ x3 + x4) | clu1 + clu2, data=data)
In our analysis we will not use the IV-specification, therefore we can specify it our any other given part we are not using as 0.
For further information check out: https://rdrr.io/cran/lfe/man/felm.html
4.2.2) Perform a linear regression with fixed effects stored in reg4_1. Regress the dependent variable loansize on the ordinary independent variable offer4 and the fixed effects low + med + waved2 + waved3 and cluster it by branchuse. Replace the question marks with the missing variables. If you have no clue how to perform the regression, just press hint for help.
# reg4_1 <- felm(??? ~ ??? | ??? |0| ???, data = filter(stata_data, offer4==final4, normrate_less==1))
# summary(reg4_1)
reg4_1 <- felm(loansize ~ offer4 |low + med + waved2 + waved3 |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1))
summary(reg4_1)
##
## Call:
## felm(formula = loansize ~ offer4 | low + med + waved2 + waved3 | 0 | branchuse, data = filter(stata_data, offer4 == final4, normrate_less == 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -334.2 -77.2 -59.7 -49.8 9771.0
##
## Coefficients:
## Estimate Cluster s.e. t value Pr(>|t|)
## offer4 -4.368 1.093 -3.996 0.0000646 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 506.3 on 31225 degrees of freedom
## Multiple R-squared(full model): 0.03295 Adjusted R-squared: 0.03279
## Multiple R-squared(proj model): 0.0004047 Adjusted R-squared: 0.0002446
## F-statistic(full model, *iid*):212.8 on 5 and 31225 DF, p-value: < 0.00000000000000022
## F-statistic(proj model): 15.97 on 1 and 107 DF, p-value: 0.0001187
## *** Standard errors may be too high due to more than 2 groups and exactDOF=FALSE
The estimate coefficient shows that for each 100 basis-point increase in the interest rate the loan size can be associated with a decrease by approximately 4.4R. We can use our result to calculate the elasticity. To do so we need the mean loan size and offer rate. Then we can use the formula for elasticity of demand.
The elasticity of demand can be calculated with the use of the midpoint formula:
elasticity = estimate coefficient x (Ø average Offer Rate/ Ø loan size)
Quiz: Calculate the price elasticity of demand for our sub-sample. The mean offer rate is 7.8 and the mean loan size 106. Type in you answer in the box below rounded by two digits.
Answer: -0.32
We also want to explore the loan price demand estimate for a sub-sample of non-borrowers. We will re-estimate the model from the previous regression, though we add additional control variables and branch fixed effects. The additional controls added to unconditional specifications include: quadratics in internal credit score, external credit score, and gross income at time of pre-approval, months since last loan with lender, number of prior loans with lender, gender, number of dependents, marital status, quadratic in age, rural residence, education, and province. Controls for conditional specifications include net income at the time of approval. The command omit() of the stargazer packages omits the control variables in our stargazer output below.
4.2.3) Press checkto reestimate the regression in 4.2.2) with additional control variables for non-borrowers.
stata_data <- stata_data %>% mutate(grossincomesq = grossincome^2, agesq = age^2, appscoresq = appscore^2, itcscoresq = itcscore^2, sales_netincomesq = sales_netincome^2, sales_grossincomesq = sales_grossincome^2)
reg4_2 <- felm(loansize ~ offer4 + grossincome + grossincomesq + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq | low + med + waved2 + waved3 + female + married + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1))
stargazer(reg4_1, reg4_2, omit = c("grossincome", "grossincomesq", "dormancy", "trcount", "dependants","age", "agesq", "appscore", "appscoresq" ,"itcscore" , "itcscoresq", "trcount"), type="html", header = FALSE, se=list(coef(summary(reg4_1, reg4_2, cluster = c("html")))[, 2]))
| Dependent variable: | ||
| loansize | ||
| (1) | (2) | |
| offer4 | -4.368*** | -4.394*** |
| (1.093) | (1.143) | |
| Observations | 31,231 | 28,197 |
| R2 | 0.033 | 0.062 |
| Adjusted R2 | 0.033 | 0.057 |
| Residual Std. Error | 506.350 (df = 31225) | 499.250 (df = 28061) |
| Note: | p<0.1; p<0.05; p<0.01 | |
As we can see, the result does not change for non-borrowers. This seems to be consistent with non-borrowers having the same intensive margin price sensitivity as borrowers, which we saw in exercise 4.1. However, as also discussed in the previous exercises, the result is difficult to interpret, because of the fact that the price sensitivity is nonzero and inconsistent on the extensive margin. The loan size demanded may be affected by characteristics other than the risk category. Therefore, an interpretation of our loan size elasticity results is more useful for a subsample of clients who actually borrowed from our lender.
No we perform the regression conditional on borrowing or in other words on only borrowers. In column (1) we run the regression of the loan size on the standard conditions of the experiment for borrowers only. In column (2) we add the additional controls for selection and in column (3) we run a Tobit regression to find out if the loan size demand may be censored by supply constraints.
The elasticity of demand can be calculated with the use of the midpoint formula:
elastcity = estimate coefficent x (Ø average Offer Rate/ Ø loan size)
4.2.4) Press Check to run the above mentioned regressions.
stata_data <- stata_data %>% mutate(grossincomesq = grossincome^2, agesq = age^2, appscoresq = appscore^2, itcscoresq = itcscore^2, sales_netincomesq = sales_netincome^2, sales_grossincomesq = sales_grossincome^2)
reg4_3 <- felm(loansize ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
reg4_4 <- felm(loansize ~ offer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
reg4_5 <- censReg(loansize ~ offer4 + low + med + waved2 + waved3 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq + female + married + rural + edhi + appscore0 + itczero + province, data = stata_data)
stargazer(reg4_3, reg4_4, reg4_5, omit = c("low", "med", "waved2" ,"waved3","grossincome" , "grossincomesq" , "dormancy" , "trcount" , "female" , "dependants" , "married" ,"age" , "agesq" , "rural" ,"edhi" , "appscore", "appscoresq" , "appscore0" , "itcscore" , "itcscoresq" , "itczero" , "branchuse" , "province"), type="html", header = FALSE, digits=5)
## Error in attr(ll, "df") <- sum(activePar(object)): Versuch ein Attribut von NULL zu setzen
As we can see in column (1) an increase of the offer rate by 100-basis-points for borrowers can be associated with a -25.876R lower loan size. We can also apply the formula for the loan demand elasticity on our result, then we get an implied elasticity of -0.13 which seems comparatively small. Column (2) presents our estimate of loan size price sensitivity conditional on borrowing and with the additional control variables. It estimates a slightly higher decrease of the loan size compared to column (1) of -33.715R and an increased elasticity of -0.17. The result of the Tobit regression in column (3), without branch fixed effects, does not change significantly in comparison to the regression column (2), which indicates that the loan size demand is not censored by supply constraints
We can portray our results of the conditional loan size in a demand curve graphically as we did in task 4.1.5).
4.2.5) Press Check to generate the demand curve for borrowers only.
data_plot2 <- stata_data %>% filter(tookup==1)
x_plot2 <- lm(offer4 ~ low + med + waved2 + waved3, data=data_plot2)
y_plot2 <- lm(loansize ~ low + med + waved2 + waved3, data=data_plot2)
ggplot(data_plot2, aes(x=x_plot2$residuals, y=y_plot2$residuals)) + geom_smooth(method = "loess", span=0.5)
## `geom_smooth()` using formula 'y ~ x'
The graph shows that the effect that an increase of the interest rate has a negative effect on the loan size and a decreased offer rate an positive effect on the amount borrowed.
Now, we want to take a look at the log-log-specifications of our previous regressions. We can use the log-log-regression estimates as an alternative to determine the elasticities of demand. The difference between a normal linear-linear-model and a log-log-model is that the we use the logarithmized values of the \(y\) and \(x\) variable. The estimates of a log-log-regression can be interpreted as if we increase the \(x\) variable by one percent the \(y\) changes in average by \(\beta_1\) percent.
4.2.6) Change the loansize and offer4 variables to their logarithmized (ln) values. Press Check afterwards.
# reg4_6 <- felm(loansize ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(stata_data, offer4==final4, tookup==1, normrate_less==1))
#
# reg4_7 <- felm(loansize ~ offer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
#
# reg4_8 <- felm(loansize ~ offer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married + rural + edhi + appscore0 + itczero + province |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
#
# stargazer(reg4_6, reg4_7, reg4_8, omit = c("low", "med", "waved2" ,"waved3","grossincome" , "grossincomesq" , "dormancy" , "trcount" , "female" , "dependants" , "married" ,"age" , "agesq" , "rural" ,"edhi" , "appscore", "appscoresq" , "appscore0" , "itcscore" , "itcscoresq" , "itczero" , "branchuse" , "province"), type="html", header = FALSE)
reg4_6 <- felm(lnloansize ~ lnoffer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(stata_data, offer4==final4, tookup==1, normrate_less==1))
reg4_7 <- felm(lnloansize ~ lnoffer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married + rural + edhi + appscore0 + itczero + branchuse + province |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
reg4_8 <- felm(lnloansize ~ lnoffer4 + appscore + appscoresq + itcscore + itcscoresq + trcount + age + dormancy + dependants + agesq + sales_grossincome + sales_grossincomesq + sales_netincome + sales_netincomesq | low + med + waved2 + waved3 + female + married + rural + edhi + appscore0 + itczero + province |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
stargazer(reg4_6, reg4_7, reg4_8, omit = c("low", "med", "waved2" ,"waved3","grossincome" , "grossincomesq" , "dormancy" , "trcount" , "female" , "dependants" , "married" ,"age" , "agesq" , "rural" ,"edhi" , "appscore", "appscoresq" , "appscore0" , "itcscore" , "itcscoresq" , "itczero" , "branchuse" , "province"), type="html", header = FALSE)
| Dependent variable: | |||
| lnloansize | |||
| (1) | (2) | (3) | |
| lnoffer4 | -0.113** | -0.143*** | -0.138*** |
| (0.049) | (0.040) | (0.041) | |
| sales_netincome | 0.0001*** | 0.0001*** | |
| (0.00001) | (0.00001) | ||
| sales_netincomesq | -0.000*** | -0.000*** | |
| (0.000) | (0.000) | ||
| Observations | 2,325 | 2,304 | 2,304 |
| R2 | 0.058 | 0.342 | 0.297 |
| Adjusted R2 | 0.056 | 0.308 | 0.288 |
| Residual Std. Error | 0.714 (df = 2319) | 0.610 (df = 2190) | 0.619 (df = 2274) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
The the log-log-specifications estimate a loan demand elasticity of -0.11 for borrowers without additional control variables and branch fixed affects in column (1) and -0.13 with additional controls and branch fixed effects. For the Tobit specification in column (3) we observe an elasticity of -0.14.
After all, we still find elasticities of loan size demand that are quite low.
Alfred Marshall was the first to develop the standard supply and demand graph. In his most important book, Principles of Economics, Marshall emphasized that the price and output of a good are determined by both supply and demand: the two curves are like scissor blades that intersect at equilibrium.
In this section we want to determine the optimal pricing strategy for our lender. We combine the average price elasticities of demand results from our prior section with additional information on revenues and repayment. In addition
5.1.1) Press Checkto load in the data once again.
stata_data <- read_dta("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
We already know what influence an adjusted offer rate has on the loan size and take-up for clients. Now we want to define the consequences for the lender or rather if an increased offer rate would be profitable for the lender. Here we encounter an alleged point of criticism of micro credits. It is often assumed that for lenders the best strategy for profit maximization is to charge horrendous interest rates. Which seems logical at first sight.
To examine this thesis, let us have a look at the optimal price for the lender at first. For this we will aggregate the revenue and repayment results (grossinterest) over the entire sample frame for clients who borrowed at or below the lenders standard rate. This will provide us with information on the price sensitivity of gross revenue obtained on initial pre-deadline borrowing. Or in other words it will show us how an adjusted interest rate affects the lender’s gross revenue.
5.1.2) Just press Check to perform the regression of grossinterest on offer4.
reg5_1 <- felm(grossinterest ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1))
stargazer(reg5_1, type="html", header=FALSE)
| Dependent variable: | |
| grossinterest | |
| offer4 | 2.553*** |
| (0.438) | |
| Observations | 31,231 |
| R2 | 0.020 |
| Adjusted R2 | 0.020 |
| Residual Std. Error | 264.218 (df = 31225) |
| Note: | p<0.1; p<0.05; p<0.01 |
Before we continue with the interpretation of the regression output. Let us create a regression adjusted demand curve for the revenue in respect to price.
5.1.3) Just press Check and the code below will create the demand curve.
data_plot3 <- stata_data %>% filter(offer4==final4, normrate_less==1)
x_plot3 <- felm(offer4 ~ low + med + waved2 + waved3, data=data_plot3)
y_plot3 <- felm(grossinterest ~ low + med + waved2 + waved3, data=data_plot3)
ggplot(data_plot3, aes(x=x_plot3$residuals, y=y_plot3$residuals)) + geom_smooth(method = "loess", span=0.8, se=FALSE)
## `geom_smooth()` using formula 'y ~ x'
If we take a look at both, our regression output in 5.1.2) and our demand curve in 5.1.3), we see that the demand curve for revenue is slightly sloping upwards for clients who borrowed at or below the standard rate. Furthermore, our results provide information about the effect of an adjusted offer rate.
Finish the following sentence (rounded by one digit) and write your answer in the answer-box below:
Quiz: The gross revenue result implies that a 100-basis-point decrease of the offer rate reduces gross revenue by R???.
Answer: 2.6
reg5_2 <- felm(pstdue_average ~ offer4 | low + med + waved2 + waved3 |0| branchuse, data = filter(stata_data, offer4==final4, normrate_less==1))
reg5_3 <- censReg(pstdue_average ~ offer4 + low + med + waved2 + waved3, data = filter(stata_data, offer4==final4, normrate_less==1, tookup==1))
stargazer(reg5_2, reg5_3, type="text",align=TRUE, dep.var.labels=c("Gross interest revenue","Average past due", "Average past due"), covariate.labels=c("interest rate in pp terms (e.g., 8.2)"),no.space=TRUE)
===================================================================== Dependent variable:
——————————- Gross interest revenue
felm censored
regression (1) (2)
——————————————————————— interest rate in pp terms (e.g., 8.2) 12.161*** 18.064*** (3.523) (5.935)
low -275.938 (35.543)
med 114.872 (33.159)
waved2 -16.009
(43.927)
waved3 125.466*** (41.683)
logSigma 6.319***
(0.021)
Constant -211.839 (60.726)
——————————————————————— Observations 2,325 2,325
R2 0.050
Adjusted R2 0.048
Log Likelihood -10,179.040 Akaike Inf. Crit. 20,372.080 Bayesian Inf. Crit. 20,412.340 Residual Std. Error 365.427 (df = 2319)
===================================================================== Note: p<0.1; p<0.05; ***p<0.01
stata_data <- stata_data %>% mutate(grossincomesq = grossincome^2, agesq = age^2, appscoresq = appscore^2, itcscoresq = itcscore^2, sales_netincomesq = sales_netincome^2, sales_grossincomesq = sales_grossincome^2)
reg6_1 <- lm(loansize ~ offer4 + low + med + waved2 + waved3 + grossincome + grossincomesq + dormancy + trcount + female + dependants + married + age + agesq + rural + edhi + appscore + appscoresq + appscore0 + itcscore + itcscoresq + itczero + province + branchuse, data = filter(stata_data, offer4==final4, normrate_less==1))
# edhi=1 -> High education
reg6_2 <- lm(loansize ~ offer4 + low + med + waved2 + waved3 + grossincome + grossincomesq + dormancy + trcount + female + dependants + married + age + agesq + rural + appscore + appscoresq + appscore0 + itcscore + itcscoresq + itczero + province + branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, edhi==1))
# dormancy<10 -> Borrowed in last 9 months
reg6_3 <- lm(loansize ~ offer4 + low + med + waved2 + waved3 + grossincome + grossincomesq + trcount + female + dependants + married + age + agesq + rural + edhi + appscore + appscoresq + appscore0 + itcscore + itcscoresq + itczero + province + branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, dormancy<10))
reg6_4 <- lm(loansize ~ offer4+ low + med + waved2 + waved3 + grossincome + grossincomesq + dormancy + female + dependants + married + age + agesq + rural + edhi + appscore + appscoresq + appscore0 + itcscore + itcscoresq + itczero + province + branchuse, data = filter(stata_data, offer4==final4, normrate_less==1, trcount>2))
stargazer(reg6_1, reg6_2, reg6_3, reg6_4, type="html", header = FALSE)
| Dependent variable: | ||||
| loansize | ||||
| (1) | (2) | (3) | (4) | |
| offer4 | -4.394*** | -5.397** | -6.591** | -5.498** |
| (1.273) | (2.478) | (2.732) | (2.169) | |
| low | 207.747*** | 290.921*** | 217.114*** | 203.315*** |
| (11.862) | (23.001) | (16.873) | (19.128) | |
| med | 120.032*** | 161.366*** | 129.626*** | 129.120*** |
| (12.368) | (24.017) | (16.667) | (19.829) | |
| waved2 | 48.842 | 46.996 | -3.268 | 133.057 |
| (83.299) | (148.079) | (163.354) | (127.097) | |
| waved3 | ||||
| grossincome | 10.649*** | 11.454*** | 16.513*** | 13.252*** |
| (1.823) | (2.812) | (3.439) | (3.081) | |
| grossincomesq | -0.101*** | -0.103*** | -0.134*** | -0.110*** |
| (0.030) | (0.039) | (0.044) | (0.039) | |
| dormancy | -5.146*** | -6.460*** | -5.710*** | |
| (0.602) | (1.162) | (1.211) | ||
| trcount | 1.493 | 2.428 | 2.720 | |
| (0.954) | (1.964) | (1.757) | ||
| female | 1.792 | 26.880** | 1.275 | 8.627 |
| (6.738) | (13.014) | (13.264) | (11.065) | |
| dependants | -3.859* | -9.603** | -1.238 | -4.310 |
| (2.087) | (4.674) | (4.175) | (3.331) | |
| married | 12.176* | 10.824 | 12.708 | 11.551 |
| (6.870) | (13.039) | (13.624) | (11.159) | |
| age | 5.977*** | 14.655*** | 9.297** | 5.389* |
| (1.890) | (4.657) | (3.756) | (3.229) | |
| agesq | -0.071*** | -0.174*** | -0.113*** | -0.065** |
| (0.020) | (0.053) | (0.040) | (0.033) | |
| rural | 224.128 | -56.789 | 235.313 | 236.547 |
| (354.048) | (81.930) | (477.194) | (596.293) | |
| edhi | 28.651*** | 54.942*** | 40.833*** | |
| (7.004) | (13.905) | (11.498) | ||
| appscore | -0.200 | -5.555 | -3.298 | 0.689 |
| (1.982) | (4.173) | (3.914) | (3.243) | |
| appscoresq | 0.018 | 0.090 | 0.081 | 0.009 |
| (0.032) | (0.066) | (0.065) | (0.052) | |
| appscore0 | -76.879 | -255.116 | -86.766 | |
| (252.135) | (441.498) | (348.847) | ||
| itcscore | 0.242 | 0.179 | 0.184 | 0.904 |
| (0.551) | (1.024) | (1.077) | (0.923) | |
| itcscoresq | -0.0003 | -0.0002 | -0.0003 | -0.001 |
| (0.0004) | (0.001) | (0.001) | (0.001) | |
| itczero | 45.944 | 55.343 | 11.548 | 251.203 |
| (170.490) | (311.516) | (334.405) | (285.128) | |
| provinceFree State | -124.674 | -155.090 | 83.234 | -253.397 |
| (506.852) | (631.620) | (219.514) | (609.991) | |
| provinceGauteng | 19.473 | -13.308 | -35.707 | -12.774 |
| (30.237) | (64.251) | (60.245) | (51.499) | |
| provinceKwazulu-Natal | -258.388 | -73.361 | -225.088 | -388.133 |
| (363.803) | (171.213) | (504.756) | (610.141) | |
| provinceLimpopo Province | -340.390 | 8.661 | -306.820 | -268.466 |
| (506.866) | (110.186) | (692.832) | (600.075) | |
| provinceMpumalanga | -135.813 | -154.729 | -147.600 | -198.706 |
| (354.088) | (436.276) | (673.307) | (597.104) | |
| provinceNorth West | -131.135 | -270.374 | -375.009 | -185.506 |
| (354.174) | (614.868) | (674.674) | (423.433) | |
| provinceWestern Cape | 30.067 | -25.294 | 15.455 | -20.065 |
| (34.358) | (75.213) | (67.830) | (60.347) | |
| branchuseCAD | -67.903* | -169.715** | -78.558 | |
| (38.537) | (72.501) | (57.492) | ||
| branchuseCAE | 161.359 | 576.799 | 152.249 | 163.348 |
| (354.026) | (613.755) | (477.157) | (422.546) | |
| branchuseCAI | 66.148 | 136.719* | 66.362 | 96.555 |
| (49.508) | (82.915) | (94.697) | (77.160) | |
| branchuseCAV | -30.892 | 29.912 | -44.679 | -19.176 |
| (41.641) | (80.105) | (74.389) | (64.410) | |
| branchuseCBE | -131.952 | 173.307 | -305.055 | -126.391 |
| (613.548) | (621.320) | (517.683) | (845.321) | |
| branchuseCBF | -8.748 | -0.237 | 24.426 | 10.487 |
| (28.300) | (46.104) | (61.384) | (45.315) | |
| branchuseCBG | -160.180 | 122.741 | -187.940 | |
| (612.006) | (616.656) | (1,062.468) | ||
| branchuseCBH | -67.787 | |||
| (499.477) | ||||
| branchuseCBK | 1.412 | 2.686 | 53.385 | 6.446 |
| (30.430) | (57.384) | (61.891) | (48.392) | |
| branchuseCBM | 414.819 | 416.503 | -347.142 | 1,071.369 |
| (550.984) | (680.055) | (705.134) | (739.537) | |
| branchuseCBS | -276.999 | -46.203 | -285.341 | -352.353 |
| (355.597) | (104.173) | (481.658) | (599.026) | |
| branchuseCBV | -31.257 | -43.334 | -33.938 | -6.435 |
| (34.649) | (65.684) | (68.822) | (62.807) | |
| branchuseCBY | 10.829 | -29.392 | 25.506 | 9.762 |
| (24.739) | (49.973) | (49.784) | (42.940) | |
| branchuseCCK | -10.066 | 19.355 | 3.342 | -103.184 |
| (58.642) | (115.057) | (93.461) | (149.963) | |
| branchuseCCM | -31.731 | -61.290 | -53.836 | -165.913 |
| (62.602) | (126.792) | (98.061) | (202.430) | |
| branchuseCCP | -337.530 | -337.459 | -361.591 | -357.391 |
| (499.987) | (613.663) | (673.462) | (596.688) | |
| branchuseCCS | 622.496 | 918.562 | 908.430 | |
| (390.353) | (613.073) | (665.087) | ||
| branchuseCCT | -4.514 | 48.337 | -9.134 | 29.049 |
| (33.470) | (66.151) | (65.638) | (63.599) | |
| branchuseCCV | -192.665 | -97.854 | -242.628 | -196.430 |
| (353.374) | (611.830) | (672.307) | (421.407) | |
| branchuseCCW | 211.437 | 202.275 | 198.558 | 198.796 |
| (499.927) | (613.483) | (673.335) | (596.562) | |
| branchuseCDM | 274.821 | 86.180 | 322.341 | 286.554 |
| (353.394) | (68.794) | (475.754) | (595.313) | |
| branchuseCDP | 248.671 | 29.044 | 277.240 | 267.197 |
| (353.834) | (82.948) | (476.838) | (596.194) | |
| branchuseCDS | 211.562 | -33.494 | 212.427 | 251.989 |
| (353.832) | (77.167) | (477.063) | (596.087) | |
| branchuseCDU | 202.096 | -45.776 | 181.033 | 218.618 |
| (353.833) | (85.483) | (476.817) | (596.019) | |
| branchuseCEL | -10.257 | -22.862 | -80.461 | -88.425 |
| (35.301) | (72.437) | (66.172) | (60.927) | |
| branchuseCEM | 200.770 | 2.850 | 174.263 | 209.517 |
| (353.540) | (73.069) | (475.954) | (595.408) | |
| branchuseCER | -92.059 | 132.170 | -99.149 | -40.780 |
| (501.192) | (444.282) | (826.505) | (844.476) | |
| branchuseCGA | 696.266** | 1,526.480** | 699.987* | |
| (353.988) | (613.643) | (422.503) | ||
| branchuseCGD | -396.805 | -148.803 | -382.617 | -482.933 |
| (711.676) | (767.596) | (1,075.354) | (1,039.496) | |
| branchuseCGK | 135.776 | -76.526 | 126.811 | 153.039 |
| (363.422) | (187.692) | (490.111) | (631.119) | |
| branchuseCGM | 4.617 | 29.496 | 21.428 | 29.410 |
| (23.382) | (39.906) | (47.579) | (39.153) | |
| branchuseCGO | -291.285 | -306.784 | -315.826 | |
| (499.820) | (613.176) | (673.018) | ||
| branchuseCGP | -21.656 | -51.624 | 5.205 | -14.643 |
| (26.462) | (45.457) | (54.236) | (46.946) | |
| branchuseCGR | -19.223 | -51.156 | 2.656 | -39.200 |
| (25.062) | (44.909) | (52.387) | (41.405) | |
| branchuseCGS | 148.881 | 287.395 | -47.380 | 222.617 |
| (178.428) | (276.289) | (304.006) | (245.114) | |
| branchuseCGT | 9,401.419*** | 9,573.989*** | 9,358.945*** | 9,342.464*** |
| (612.638) | (619.674) | (825.272) | (843.542) | |
| branchuseCGY | 333.742 | -46.656 | 282.789 | 192.412 |
| (507.410) | (110.105) | (694.058) | (600.814) | |
| branchuseCHL | -180.763 | 205.433 | -108.368 | -147.974 |
| (409.096) | (366.757) | (564.816) | (667.677) | |
| branchuseCHV | -395.969 | -168.128 | -452.650 | -411.563 |
| (706.764) | (753.237) | (1,062.988) | (1,031.559) | |
| branchuseCIM | 210.527 | 0.492 | 189.015 | 204.699 |
| (353.802) | (88.179) | (476.626) | (595.866) | |
| branchuseCJA | -5.063 | 20.331 | 46.141 | -31.738 |
| (29.002) | (58.387) | (57.216) | (46.036) | |
| branchuseCJB | -13.072 | -53.058 | 13.882 | -6.732 |
| (24.645) | (46.600) | (52.664) | (39.250) | |
| branchuseCJC | -39.340 | -30.913 | -15.096 | -66.847 |
| (30.916) | (56.070) | (62.808) | (62.316) | |
| branchuseCJG | 1.360 | 24.849 | 45.607 | -14.624 |
| (22.940) | (45.140) | (49.397) | (39.797) | |
| branchuseCJJ | 305.965 | 47.032 | 257.455 | 401.795 |
| (364.106) | (168.722) | (504.659) | (614.000) | |
| branchuseCJM | 6.208 | 8.995 | 71.817 | 10.094 |
| (21.166) | (39.300) | (45.189) | (34.359) | |
| branchuseCJP | -7.159 | 64.765 | 72.798 | 218.262 |
| (151.292) | (306.754) | (276.114) | (421.369) | |
| branchuseCJR | -35.952 | -80.901 | -17.998 | -49.385 |
| (28.021) | (50.468) | (59.540) | (43.978) | |
| branchuseCJW | -26.151 | -70.146 | 24.199 | -51.139 |
| (38.520) | (75.402) | (75.407) | (72.351) | |
| branchuseCKD | -73.857 | 0.972 | 22.637 | -191.592 |
| (499.392) | (749.357) | (950.511) | (728.827) | |
| branchuseCKM | -46.842* | -73.378 | -40.116 | -56.048 |
| (27.840) | (51.655) | (59.981) | (45.011) | |
| branchuseCKP | -35.018 | -48.065 | -8.929 | -55.159 |
| (23.256) | (43.682) | (48.914) | (39.116) | |
| branchuseCKR | -174.879 | 94.380 | -408.635 | -175.999 |
| (613.448) | (624.108) | (526.616) | (845.057) | |
| branchuseCKS | 51.905 | 124.463 | 66.969 | 111.810 |
| (48.721) | (87.097) | (91.895) | (78.706) | |
| branchuseCKW | -242.814 | -13.576 | -297.817 | -286.323 |
| (356.173) | (109.972) | (482.967) | (599.811) | |
| branchuseCKY | 93.615 | 87.980 | -123.840 | 126.660 |
| (500.110) | (614.358) | (165.329) | (597.222) | |
| branchuseCLM | -51.042 | 13.698 | -99.883 | -39.062 |
| (34.361) | (68.355) | (66.463) | (52.961) | |
| branchuseCLT | 92.169 | 10.271 | 8.166 | -95.644 |
| (618.520) | (68.578) | (841.654) | (68.507) | |
| branchuseCLY | -147.013 | -104.646 | ||
| (706.648) | (1,062.693) | |||
| branchuseCMA | 109.656 | -80.262 | -53.273 | 355.414 |
| (167.148) | (433.074) | (337.449) | (298.667) | |
| branchuseCMB | ||||
| branchuseCMD | -2.179 | 4.953 | 23.010 | -16.964 |
| (27.461) | (49.089) | (55.782) | (46.876) | |
| branchuseCMG | 163.984 | 240.317 | 90.167 | 156.806 |
| (499.997) | (613.707) | (673.308) | (596.276) | |
| branchuseCMI | 41.124 | 54.229 | -112.053 | -25.760 |
| (152.591) | (310.221) | (303.918) | (201.978) | |
| branchuseCMK | ||||
| branchuseCMP | 45.461 | -16.921 | 105.531* | 78.494 |
| (32.178) | (56.712) | (58.397) | (73.719) | |
| branchuseCMT | 39.901 | 79.654 | 12.271 | 47.060 |
| (38.525) | (83.834) | (75.725) | (58.086) | |
| branchuseCMV | -171.288 | -113.621 | -122.260 | -319.816 |
| (288.671) | (611.997) | (389.091) | (595.419) | |
| branchuseCMZ | -23.761 | 69.351 | -15.421 | |
| (353.937) | (613.500) | (596.276) | ||
| branchuseCNG | 18.149 | 101.625 | -15.129 | -13.327 |
| (41.457) | (80.706) | (82.234) | (71.194) | |
| branchuseCNL | 3,746.531*** | 3,574.007*** | 3,642.690*** | 3,613.213*** |
| (794.183) | (613.214) | (1,075.136) | (597.045) | |
| branchuseCNM | 211.816 | -30.836 | 184.916 | 223.547 |
| (353.671) | (72.425) | (476.182) | (595.632) | |
| branchuseCNS | 316.402 | -60.966 | 196.971 | |
| (711.032) | (618.424) | (844.022) | ||
| branchuseCPA | -13.347 | 21.010 | 20.801 | -16.663 |
| (36.486) | (77.639) | (72.519) | (64.058) | |
| branchuseCPB | 332.018 | -52.303 | 259.405 | 216.252 |
| (506.566) | (97.870) | (691.917) | (599.336) | |
| branchuseCPC | 226.821 | -251.799 | -74.534 | 240.707 |
| (455.828) | (615.795) | (671.507) | (687.015) | |
| branchuseCPE | 39.898 | -22.086 | 31.202 | -16.002 |
| (33.360) | (69.497) | (65.796) | (57.568) | |
| branchuseCPF | 35.204 | -148.705 | ||
| (710.970) | (704.400) | |||
| branchuseCPG | 54.462 | 157.398 | 118.223 | 27.734 |
| (45.001) | (102.335) | (90.105) | (91.270) | |
| branchuseCPL | 357.998 | -9.600 | 105.944 | 481.155 |
| (394.909) | (312.201) | (548.341) | (643.003) | |
| branchuseCPN | 211.695 | 33.801 | 208.284 | 219.609 |
| (353.500) | (75.106) | (475.921) | (595.504) | |
| branchuseCPO | 557.019 | 420.564 | 562.355 | 21.521 |
| (564.393) | (442.267) | (792.374) | (843.413) | |
| branchuseCPS | -79.626** | -49.007 | -169.043** | -101.945* |
| (37.238) | (72.117) | (72.900) | (57.215) | |
| branchuseCPT | 16.900 | 13.797 | 24.436 | -95.389 |
| (36.360) | (74.004) | (66.902) | (66.551) | |
| branchuseCPW | 74.694 | -43.464 | -33.784 | -65.033 |
| (618.985) | (91.376) | (843.195) | (84.829) | |
| branchuseCQT | -46.360 | 280.902** | 7.193 | 57.583 |
| (357.202) | (116.226) | (484.400) | (602.829) | |
| branchuseCRB | -52.986* | -105.778* | -42.203 | -72.458 |
| (31.833) | (62.009) | (60.902) | (65.045) | |
| branchuseCRL | -28.067 | -283.973 | -34.849 | -31.395 |
| (250.134) | (433.156) | (389.233) | (421.763) | |
| branchuseCRM | 270.326 | 81.632 | 315.470 | 313.646 |
| (353.953) | (86.895) | (477.315) | (596.181) | |
| branchuseCRP | 13.673 | -112.791 | 102.928 | -201.127 |
| (159.498) | (306.782) | (339.864) | (268.429) | |
| branchuseCSD | 114.359 | -30.955 | 275.747 | 125.145 |
| (353.616) | (614.614) | (672.223) | (422.029) | |
| branchuseCSM | 224.636 | 213.711 | 236.203 | |
| (354.105) | (478.153) | (596.296) | ||
| branchuseCSP | 101.492*** | 200.354*** | 239.873*** | 219.560*** |
| (30.031) | (54.892) | (58.166) | (50.254) | |
| branchuseCSW | 18.565 | |||
| (499.758) | ||||
| branchuseCTZ | 99.986 | 34.613 | ||
| (618.404) | (841.282) | |||
| branchuseCUL | -151.009 | -188.470 | -281.090 | -182.239 |
| (168.364) | (278.393) | (303.982) | (269.043) | |
| branchuseCUP | 99.238 | 150.960 | 66.382 | |
| (502.203) | (634.339) | (605.787) | ||
| branchuseCUT | ||||
| branchuseCVD | -233.642 | |||
| (612.479) | ||||
| branchuseCVP | -305.184 | -403.143 | -309.525 | -276.231 |
| (353.540) | (612.194) | (476.242) | (595.565) | |
| branchuseCVR | -126.133 | -159.098 | -82.626 | -50.463 |
| (355.807) | (629.441) | (482.895) | (595.361) | |
| branchuseCVS | ||||
| branchuseCWB | ||||
| branchuseCWK | ||||
| branchuseCWY | ||||
| branchuseCZE | 65.649 | |||
| (79.259) | ||||
| Constant | -78.715 | -91.520 | -101.087 | -235.589 |
| (178.235) | (335.049) | (348.816) | (296.646) | |
| Observations | 28,197 | 11,275 | 13,201 | 14,806 |
| R2 | 0.062 | 0.080 | 0.058 | 0.071 |
| Adjusted R2 | 0.058 | 0.071 | 0.049 | 0.063 |
| Residual Std. Error | 499.170 (df = 28070) | 610.970 (df = 11157) | 671.184 (df = 13083) | 594.650 (df = 14688) |
| F Statistic | 14.677*** (df = 126; 28070) | 8.332*** (df = 117; 11157) | 6.840*** (df = 117; 13083) | 9.546*** (df = 117; 14688) |
| Note: | p<0.1; p<0.05; p<0.01 | |||
5.2.1) Press Checkto load in the data once again.
stata_data <- read_dta("~/Documents/GitHub/thesis_code_rep/kz_demandelasts_aer08.dta")
stata_data <- stata_data %>% mutate(grossincomecat2 = xtile(grossincome, n=2))
stata_data <- stata_data %>% mutate(sales_grossincomecat2 = xtile(grossincome, n=10))
# * EXTENSIVE
# dprobit applied offer4 low med waved2 waved3 if (female==1 & normrate_less==1), cluster(branchuse)
# estimates store m1, title((1))
# sum offer4 applied if e(sample)
reg7_1 <- lm(applied ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, female==1, normrate_less==1))
# dprobit applied offer4 low med waved2 waved3 if (grossincomecat2==1 & normrate_less==1), cluster(branchuse)
# estimates store m2, title((2))
# sum offer4 applied if e(sample)
reg7_2 <- lm(applied ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, normrate_less==1, grossincomecat2==1))
# dprobit applied offer4 low med waved2 waved3 if (grossincomecat2==1 & female==1 & normrate_less==1), cluster(branchuse)
# estimates store m3, title((3))
# sum offer4 applied if e(sample)
reg7_3 <- lm(applied ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, normrate_less==1, grossincomecat2==1, female==1))
# *UNCONDITIONAL LOAN SIZE
# regress loansize offer4 low med waved2 waved3 if (female==1 & normrate_less== 1 & (offer4==final4)), cluster(branchuse)
# sum loansize offer4 if e(sample)
# estimates store m4, title((4))
reg7_4 <- lm(loansize ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, normrate_less==1, offer4==final4, female==1))
# regress loansize offer4 low med waved2 waved3 if (grossincomecat2==1 & normrate_less== 1 & (offer4==final4)), cluster(branchuse)
# sum loansize offer4 if e(sample)
# estimates store m5, title((5))
reg7_5 <- lm(loansize ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, normrate_less==1, offer4==final4, grossincomecat2==1))
# regress loansize offer4 low med waved2 waved3 if (grossincomecat2==1 & female==1 & normrate_less== 1 & (offer4==final4)), cluster(branchuse)
# sum loansize offer4 if e(sample)
# estimates store m6, title((6))
reg7_6 <- lm(loansize ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, normrate_less==1, offer4==final4, grossincomecat2==1, female==1))
# *CONDITIONAL LOAN SIZE
# regress loansize offer4 low med waved2 waved3 if (female==1 & tookup==1 & normrate_less== 1 & (offer4==final4)), cluster(branchuse)
# sum loansize offer4 if e(sample)
# estimates store m7, title((7))
reg7_7 <- lm(loansize ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, tookup==1, normrate_less==1, offer4==final4, female==1))
# regress loansize offer4 low med waved2 waved3 if (sales_grossincomecat2==1 & tookup==1 & normrate_less== 1 & (offer4==final4)), cluster(branchuse)
# sum loansize offer4 if e(sample)
# estimates store m8, title((8))
reg7_8 <- lm(loansize ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, sales_grossincomecat2==1, normrate_less==1, offer4==final4, tookup==1))
# regress loansize offer4 low med waved2 waved3 if (sales_grossincomecat2==1 & female==1 & tookup==1 & normrate_less== 1 & (offer4==final4)), cluster(branchuse)
# sum loansize offer4 if e(sample)
# estimates store m9, title((9))
reg7_9 <- lm(loansize ~ offer4 + low + med + waved2 + waved3, data=filter(stata_data, sales_grossincomecat2==1, normrate_less==1, offer4==final4, female==1, tookup==1))
stargazer(reg7_1, reg7_2, reg7_3, reg7_4, reg7_5, reg7_6, reg7_7, reg7_8, reg7_9, type="html", header = FALSE)
| Dependent variable: | |||||||||
| applied | loansize | ||||||||
| (1) | (2) | (3) | (4) | (5) | (6) | (7) | (8) | (9) | |
| offer4 | -0.003*** | -0.003*** | -0.003*** | -5.804*** | -4.012*** | -5.182*** | -29.242 | -22.986 | -47.228** |
| (0.001) | (0.001) | (0.001) | (1.857) | (1.314) | (1.818) | (17.841) | (16.199) | (23.676) | |
| low | 0.109*** | 0.116*** | 0.109*** | 261.228*** | 201.633*** | 189.432*** | 681.932*** | 354.269*** | 630.249*** |
| (0.006) | (0.006) | (0.009) | (13.931) | (10.881) | (15.062) | (98.960) | (122.664) | (191.914) | |
| med | 0.112*** | 0.110*** | 0.108*** | 193.014*** | 144.574*** | 153.434*** | 267.171*** | 20.548 | 60.598 |
| (0.006) | (0.006) | (0.009) | (15.484) | (11.291) | (15.612) | (102.204) | (105.656) | (137.439) | |
| waved2 | -0.006 | 0.005 | 0.002 | 29.762* | 11.816* | 6.695 | 121.835 | ||
| (0.007) | (0.004) | (0.005) | (16.033) | (6.732) | (9.574) | (129.136) | |||
| waved3 | -0.009 | 35.287** | 281.650** | ||||||
| (0.007) | (15.379) | (124.292) | |||||||
| Constant | 0.097*** | 0.084*** | 0.088*** | 77.134*** | 77.229*** | 86.845*** | 1,232.452*** | 870.102*** | 1,040.131*** |
| (0.009) | (0.007) | (0.010) | (20.400) | (11.642) | (16.006) | (180.954) | (138.142) | (188.934) | |
| Observations | 25,323 | 24,428 | 11,705 | 14,786 | 14,174 | 6,770 | 1,132 | 178 | 102 |
| R2 | 0.030 | 0.029 | 0.027 | 0.036 | 0.039 | 0.041 | 0.061 | 0.075 | 0.157 |
| Adjusted R2 | 0.029 | 0.029 | 0.026 | 0.036 | 0.038 | 0.040 | 0.056 | 0.059 | 0.131 |
| Residual Std. Error | 0.278 (df = 25317) | 0.269 (df = 24423) | 0.271 (df = 11700) | 524.885 (df = 14780) | 367.624 (df = 14169) | 351.552 (df = 6765) | 1,283.396 (df = 1126) | 533.395 (df = 174) | 597.404 (df = 98) |
| F Statistic | 154.751*** (df = 5; 25317) | 182.072*** (df = 4; 24423) | 79.917*** (df = 4; 11700) | 111.312*** (df = 5; 14780) | 142.344*** (df = 4; 14169) | 72.336*** (df = 4; 6765) | 14.519*** (df = 5; 1126) | 4.684*** (df = 3; 174) | 6.074*** (df = 3; 98) |
| Note: | p<0.1; p<0.05; p<0.01 | ||||||||